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THE     CONSIDERABLE     DEVELOPMENTS     IN     STATISTICAL     METHODS 

and  the  increasing  range  of  their  application  in  recent  years  have  led 
inevitably  to  a  specialization  of  the  subject  matter.  Thus  the  fields  of 
experimental  design  and  of  sampling  have  already  emerged  as  separate 
topics  of  study. 

A  treatise  dealing  specifically  with  the  relations  among  two  or  more 
variables,  and  their  applications  in  the  interpretation  of  experimental 
results,  seems  to  be  long  overdue.  Many  experiments  either  are  de- 
signed to  study  the  relations  between  variables,  or  depend  on  such  re- 
lations for  their  interpretation.  These  and  other  applications  are  cov- 
ered by  the  general  methods  of  regression  analysis. 

Two  sources  of  confusion  have  hindered  the  fruitful  application  of 
regression  analysis  to  experimental  data.  One  is  the  viewing  of  rela- 
tions among  variables  in  terms  of  correlations,  which  may  be  appropri- 
ate for  the  analysis  of  samples  from  homogeneous  populations  but  not 
for  experimental  data,  where  interest  lies  in  the  estimation  of  one  vari- 
able from  another.  The  other  is  the  confusion,  more  common  among 
theorists  than  among  practical  experimenters,  between  regression  rela- 
tions, which  have  a  wide  range  of  validity  and  use,  and  functional  rela- 
tions, which  are  usually  difficult  to  determine  and  are  of  limited  useful- 
ness. 

The  material  presented  in  this  book  is  based  on  the  experience  of  ana- 
lyzing experimental  results,  and  has  been  included  because  the  methods 
.exemplified  have  been  found  useful  in  actual  application.  Since  many 
of  the  interesting  problems  arise  from  the  treatment  of  specific  appli- 
cations, I  have  attempted  to  relate  the  techniques  to  these  applications 
and  to  indicate  not  only  how  the  problem  has  arisen  but  also  to  what 
range  of  problems  the  given  technique  is  applicable. 

The  book  is  addressed  primarily  to  research  workers  in  the  experi- 
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mental  sciences.  The  problems  with  which  it  deals  have  arisen  from 
consultation  with  such  workers,  and  it  is  hoped  that  the  material  pre- 
sented here  will  be  not  only  intelligible  to  them  but  also  useful  in  sug- 
gesting better  ways  in  which  their  observations  may  be  treated.  Al- 
though most  of  the  examples  given  are  from  the  biological  sciences,  in 
which  these  statistical  techniques  have  been  most  consistently  developed, 
there  is  a  growing  need  for,  and  awareness  of,  their  value  in  the  physical 
sciences  and  other  fields;  I  therefore  believe  that  workers  in  these  fields 
will  also  find  the  material  useful. 

As  there  are  now  many  excellent  books  on  general  statistical  methods, 
it  has  been  assumed  that  the  reader  is  familiar  with,  or  has  ready  access 
to,  these  methods.  Accordingly,  I  have  not  presented  the  theory  that 
is  basic  to  the  results  given  here.  To  have  done  so  would  have  made 
the  book  unnecessarily  long.  For  the  same  reason,  I  have  not  treated 
computational  methods,  although  I  am  aware  that  computational  meth- 
ods in  common  use  leave  much  to  be  desired.  Since  the  book  is  ad- 
dressed to  experimenters  rather  than  to  mathematicians,  the  minimum 
of  mathematics  necessary  to  develop  the  methods  is  used,  and  the  results 
are  not  always  presented  in  their  fullest  generality  or  rigor.  Neverthe- 
less, the  more  mathematical  equipment  the  experimenter  has,  the  more 
effective  use  he  will  be  able  to  make  of  the  methods. 

In  the  application  of  statistical  methods  to  actual  situations  the  logical 
problems  must  be  kept  in  mind.  It  is  not  sufficient  to  be  competent 
with  techniques;  we  must  know,  too,  the  conditions  in  which  a  tech- 
nique is  both  relevant  and  valid.  I  have  attempted  here  to  point  out 
when  a  particular  method  is  appropriate,  how  to  make  most  effective 
use  of  the  data,  and  the  precautions  to  be  taken  in  interpreting  data. 
It  is  difficult  to  lay  down  general  rules.  Only  a  wide  background  of 
experience  in  dealing  with  experimental  and  observational  data,  and 
careful  thought  to  drawing  valid  conclusions  from  them,  can  enable  sure 
handling  of  these  problems.    Since  practical  situations  will  vary  greatly, 

I  have  tried,  while  dealing  with  typical  examples,  to  point  to  the  prin- 
ciples underlying  the  approach,  rather  than  to  give  general  rules. 

The  plan  of  the  book  is  somewhat  novel.  After  the  first  four  chap- 
ters, which  deal  with  the  determination  of  regression  relationships,  we 
turn  in  Chapters  5  and  6  to  the  important  but  seldom  discussed  ques- 
tions of  choice  among  regression  formulas  and  the  uses  of  the  regression 
equation  in  estimation.  Chapter  7,  on  the  analysis  of  covariance,  is 
the  pivot  of  the  book.  The  analysis  of  covariance,  introduced  as  a 
variant  of  multiple  regression,  is  further  applied  in  Chapters  8,  10,  and 

I I  in  the  development  of  significance  tests  for  various  multivariate  prob- 
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lems.  The  approach  used  in  these  later  chapters  should  prove  helpful 
in  bringing  the  treatment  of  important  multivariate  problems  within  the 
reach  of  experimenters  and  practical  statisticians  generally.  These  chap- 
ters deal  with  a  wide  range  of  topics — heterogeneous  data,  simultaneous 
equations,  and  discriminant  functions.  The  final  chapter,  on  linear 
functional  relationships,  is  included  to  show  the  distinction  between 
regression  and  functional  relationships  and  to  give  some  applications 
of  the  latter. 

This  book  would  not  have  been  possible  without  the  assistance  of 
numerous  friends  who  have  posed  problems,  discussed  methods  of  analy- 
sis, and  read  parts  of  the  manuscript.  To  them  I  am  indeed  grateful. 
For  many  of  the  problems  and  practical  examples  I  am  indebted  to 
research  workers  at  the  Division  of  Forest  Products  and  other  divisions 
of  the  Australian  Commonwealth  Scientific  and  Industrial  Research 
Organization.  For  the  assembling  of  suitable  data,  assistance  with  com- 
puting, and  helpful  comments  I  am  especially  indebted  to  Nell  Ditch- 
burne  of  the  Division  of  Mathematical  Statistics,  C.S.I.R.O.  The  work 
presented  in  Chapter  9  was  sponsored  by  the  Office  of  Ordnance  Re- 
search, United  States  Army. 

E.  J.  Williams 

Mackinac  Island,  Michigan 
September  1959 
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Introduction 


1.1     GENERAL 

The  subject  of  this  book  is  one  of  the  branches  of  statistics  based  on  the 
method  of  least  squares  and  the  analysis  of  variance.  With  the  increase  in 
the  scope  of  statistical  methods  in  recent  years,  certain  fairly  distinct 
branches  have  been  developed  to  meet  different  needs.  Thus,  on  the  one 
hand,  the  design  of  experiments  is  concerned  with  providing  data  from 
which  the  effects  of  various  factors  and  the  random  errors  affecting  them 
can  be  most  accurately  and  easily  determined.  On  the  other  hand, 
regression  analysis  enables  the  effects  of  various  factors  to  be  evaluated 
from  the  experimental  data  even  when  the  experiment  does  not  follow  a 
simple  pattern,  or  when  the  variables  affecting  the  results  cannot  be 
controlled  in  such  a  manner  as  to  make  possible  a  designed  experiment. 

Thus,  although  the  methods  we  shall  consider  can  all  be  formally 
described  in  terms  of  the  analysis  of  variance,  it  is  more  profitable  to 
consider  separately  the  problems  in  which  the  regression  of  one  variable 
on  others  is  of  interest.  Clearly,  such  a  method  of  analysis  can  be 
adopted,  whether  or  not  the  data  to  be  interpreted  come  from  a  designed 
experiment.  Where  the  experiment  is  designed  to  elucidate  the  effects  of 
certain  factors,  the  effects  of  other  factors  may  be  considered  through  a 
regression  analysis,  or  by  means  of  the  technique  of  the  analysis  of  co- 
variance,  which  enables  the  effects  of  uncontrolled  variables  to  be  allowed 
for  and  the  accuracy  of  the  experiment  to  be  consequently  improved. 

In  general,  an  experimenter  who  is  to  interpret  rigorously  the  outcome 
of  an  experiment  will  need  first  to  formulate  the  problem  in  mathematical 
terms  (the  mathematical  model),  then  to  test  the  concordance  of  the 
mathematical  model  in  all  relevant  respects  with  the  data,  and  finally,  if 
the  model  proves  to  be  acceptable,  to  estimate,  or  set  limits  on,  any 
constants  left  unspecified  in  the  model.  Regression  analysis  is  a  means  of 
making  such  an  interpretation  when  the  expected  value  of  one  variable  is 
defined  as  a  function  of  the  observed  values  of  other  variables.  Many 
physical  laws,  both  theoretical  and  empirical,  are  of  this  nature  when  it 
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can  be  assumed  that,  for  practical  purposes,  the  variables  are  observed 
without  error.  However,  in  the  biological  sciences,  and  indeed  in  all  the 
sciences  wherein  the  possibility  of  errors  of  observation  is  admitted,  the 
idea  of  a  relationship  among  errorless  quantities  turns  out  to  be  otiose, 
whereas  the  regression  concept  which  bases  relationships  on  the  quantities 
actually  observed  proves  to  be  exceedingly  useful. 

The  testing  of  the  concordance  between  theory  and  observation  is 
based  on  probability  considerations  which  give  rise  to  significance  tests; 
in  general,  if  one  of  a  set  of  unlikely  outcomes,  specified  along  with  the 
mathematical  model,  is  observed,  the  theory  is  rejected.  The  role  of 
tests  of  significance  in  the  interpretation  of  data  will  be  discussed  later. 

1.2  THE  APPROACH  TO  THE  INTERPRETATION  OF 
EXPERIMENTAL  DATA 

Since  statistical  methods  are  effective  only  to  the  extent  that  they  assist 
in  the  interpretation  of  observations,  the  choice  of  methods  must  always 
be  based  on  a  knowledge  and  appreciation  of  the  conditions  of  the  experi- 
ment. Only  thus  can  realistic  theories  be  framed  and  an  appropriate 
analysis  made.  A  mathematical  formulation  is  often  helpful  in  expressing 
the  basic  assumptions  clearly  and  concisely,  provided  it  also  serves  the 
end  of  a  valid  analysis  of  the  data. 

However,  although  it  is  helpful  in  any  particular  problem  to  formulate  a 
mathematical  model,  the  use  of  general  models  is  to  be  discouraged ;  the 
student  or  experimenter,  accustomed  to  thinking  in  terms  of  such  general 
concepts,  is  liable  to  try  to  fit  all  experiments  into  one  of  the  familiar  models, 
rather  than  seeking  out  the  unique  characteristics  of  any  particular  problem. 

In  recent  years  there  has  been  a  tendency  to  treat  statistical  methods  in 
greater  and  greater  generality,  as  part  of  a  more  mathematical  approach ; 
often,  however,  this  obscures  the  distinctive  features  of  a  problem.  We 
shall  therefore  often  deal  with  particular  cases  that  seem  to  be  indicative 
of  profitable  lines  of  study,  rather  than  presenting  each  problem  in  its 
fullest  generality. 

In  particular,  although  many  of  the  results  of  regression  analysis  and  of 
multivariate  analysis  generally  may  be  succinctly  presented  in  matrix 
form,  we  have  used  matrix  notation  sparingly,  in  the  belief  that  the 
interpretation  is  clearer  when  the  results  are  set  out  in  scalar  notation. 

1.3    SCOPE  OF  REGRESSION  ANALYSIS 

/  (  Regression  analysis  may  be  defined  as  the  estimation  or  prediction  of 
the  value  of  one  variable  from  the  values  of  other  given  variables.  In  the 
practical  application  of  regression  analysis,   a  number  of  interesting 
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questions  arise.  First,  there  are  the  estimation  of  the  constants  of  a 
regression  when  the  form  of  the  relationship  is  given  and  the  testing  of  the 
concordance  of  some  preassigned  regression  relation  with  the  data.  There 
is  the  related  question  of  which  variables  should  be  included  in  the  relation- 
ship. Once  a  regression  relation  has  been  established,  we  may  use  it  to 
derive  estimates,  either  of  one  of  the  variables,  based  on  values  of  the 
others,  or  of  the  effects  of  the  other  variables  on  the  one  estimated.  Again, 
the  relationship  may  be  used  to  improve  the  accuracy  of  estimates  and  of 
regressions  by  eliminating  the  effects  of  uncontrolled  variables,  as  in 
co variance  analysis. 

A  useful  generalization  of  regression  analysis  is  to  the  relations  between 
two  sets  of  variables.  This  includes  on  the  one  hand  discriminant  analysis, 
which  turns  out  in  its  practical  aspects  to  be  the  representation  of  variables 
of  one  set  in  terms  of  regressions  on  variables  of  the  other  set.  On  the 
other  hand  there  is  the  determination  of  linear  functional  relations, 
important  in  certain  practical  problems  such  as  calibration. 

It  may  be  noted  here,  and  later  chapters  will  confirm,  that  many  of  the 
practical  problems  arising  in  the  interpretation  of  relations  between  two 
sets  of  variables  are  answered  fairly  directly  by  means  of  a  regression 
analysis.  Although  much  of  the  theory  of  multivariate  analysis  is  some- 
what sophisticated  mathematically  (see  Anderson,  1958;  Roy,  1957),  it 
does  not  appear  to  be  relevant  to  the  general  run  of  practical  problems. 

1.4    SIGNIFICANCE  TESTS  IN  GENERAL 

In  the  application  of  statistical  methods  to  experimental  data,  we  have 
to  decide  which  observed  effects  are  to  be  taken  into  consideration  in  the 
interpretation  of  the  results.  Since  experiments,  no  matter  how  carefully 
controlled,  are  subject  to  error,  we  need  to  be  able  to  distinguish  effects 
that  are  due  to  the  chance  variation  in  the  material  from  effects  that  arise 
from  underlying  differences.  For  instance,  if  a  series  of  feeding  experi- 
ments on  sheep  gives  results  showing  that  an  increase  of  molybdenum 
intake  is  associated  with  a  decrease  in  liver  copper,  at  what  stage  is  the 
experimenter  justified  in  regarding  the  effect  as  real,  rather  than  merely  a 
manifestation  of  the  random  variation  among  his  experimental  animals  ? 
Clearly,  in  making  judgments  of  this  kind,  some  objective  basis  must  be 
adopted.     Such  a  basis  is  provided  by  tests  of  significance. 

Because  of  their  importance  in  inference,  we  devote  the  remainder  of 
this  chapter  to  considering  some  of  the  characteristics  of  significance  tests. 
These  points  will  be  stressed  because,  in  the  practical  applications  of 
statistical  methods,  more  trouble  arises  from  ignoring  basic  points  of 
principle  than  from  neglect  of  the  more  elaborate  techniques. 
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The  role  of  significance  tests  in  inference  has  been  carefully  set  out  by 
Fisher  (1956),  to  which  the  reader  is  referred  for  a  fuller  discussion. 
Essentially,  a  significance  test  is  a  test,  based  on  the  observational  data,  of 
some  hypothesis  or  theory.  Along  with  the  hypothesis  is  specified  a  set 
of  observational  results  which,  according  to  the  hypothesis,  is  of  low 
probability.  If  the  observations  give  a  result  which  belongs  to  this  set, 
the  test  leads  to  the  rejection  of  the  hypothesis.  Naturally,  such  a  test 
does  not  provide  a  full  interpretation  of  the  data.  It  is  simply  a  rule  for 
deciding  what  evidence  against  the  hypothesis  is  admissible.  When  the 
evidence  is  already  admitted  on  other  grounds,  or  when  the  hypothesis  is 
already  untenable  because  of  other  considerations,  the  test  of  significance 
is  irrelevant. 

Significance  tests  have  enabled  experimenters  to  make  inferences  in  an 
objective  and  orderly  manner,  unaffected  by  personal  bias.  As  a  result, 
there  has  been  a  tendency  to  appeal  to  a  significance  test  about  almost 
every  difference  or  relationship  that  is  under  examination.  One  effect  of 
this  popularity  of  significance  tests  has  been  the  proliferation  of  various 
tests  for  use  in  different  situations.  Some  of  these  tests  are  a  welcome 
addition  to  the  experimenter's  armory  of  techniques;  others,  however, 
although  valid  as  tests  of  some  hypothesis,  turn  out  not  to  be  valid  for 
supplying  the  answers  to  questions  in  which  the  experimenter  is  interested. 
In  relation  to  any  body  of  data  we  need  to  decide,  therefore,  first,  whether 
a  significance  test  is  necessary  to  its  interpretation,  and,  second,  what  the 
relevant  test  is.  In  considering  various  tests  during  the  course  of  this 
book  we  shall  try  to  indicate  the  situations  to  which  they  are  appropriate 
and  the  questions  they  can  answer. 

1.5    LIMITATIONS  OF  SIGNIFICANCE  TESTS 

The  tendency  to  base  the  interpretation  of  data  entirely  on  the  results  of 
significance  tests  has  its  dangers.  Many  workers  apply  significance  tests 
excessively,  sometimes  at  the  expense  of  sound  judgment  and  a  careful 
over-all  assessment  of  the  work.  A  sound  interpretation  will  take  into 
account  not  only  such  individual  tests  as  are  made  but  also  prior  knowledge 
and  experience  and  the  general  consistency  of  the  effects  that  show  up. 

Thus,  in  analyzing  the  results  of  a  study  of  the  rate  of  fleece  growth  of 
four  different  breeds  of  sheep,  some  workers  fitted  polynomial  trends  to 
the  relation  of  fleece  length  to  time.  It  was  found  that  for  some  breeds  a 
quadratic  regression  gave  a  satisfactory  fit,  the  cubic  term  being  non- 
significant, but  that  for  other  breeds  a  cubic  regression  was  required 
However,  with  sets  of  similar  data  such  as  this,  it  is  reasonable  to  assume 
that  the  regression  relation  is  of  the  same  form  for  each  set.    It  would 
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therefore  be  reasonable  to  decide,  on  the  basis  of  the  data  for  all  breeds 
together,  whether  the  cubic  term  contributed  to  the  fit  of  the  curves  and 
then  to  use  either  a  quadratic  or  a  cubic  polynomial  consistently  for  all 
the  breeds. 

Methods  of  making  tests  on  combined  sets  of  data  will  be  discussed 
later,  especially  in  Chapter  8.  It  is  sufficient  to  say  here,  however,  that 
numerous  tests,  of  varying  degrees  of  complexity,  have  been  devised  to 
test  combined  data  or  multiple  comparisons  from  one  set  of  data;  yet 
it  is  seldom  that  their  application  results  in  much  improvement  over  the 
exercise  of  good  judgment  and  appeal  to  previous  experience.  Certainly 
their  application  without  care  and  good  judgment  can  give  misleading 
answers. 

A  statement  by  Yates  (1951)  is  worth  quoting  here: 

The  emphasis  on  tests  of  significance,  and  the  consideration  of  the  results 
of  each  experiment  in  isolation,  have  had  the  unfortunate  consequence  that 
scientific  workers  have  often  regarded  the  execution  of  a  test  of  significance  on 
an  experiment  as  the  ultimate  objective.  Results  are  significant  or  not  and  that 
is  the  end  of  it. 

Research  workers,  therefore,  have  to  accustom  themselves  to  the  fact  that 
in  many  branches  of  research  the  really  critical  experiment  is  rare,  and  that  it 
is  frequently  necessary  to  combine  the  results  of  numbers  of  experiments  dealing 
with  the  same  issue  in  order  to  form  a  satisfactory  picture  of  the  true  situation. 

1.6    PRIOR  CONSIDERATIONS 

Another  point  to  note  is  that  often  prior  considerations,  or  the  conditions 
of  the  experiment,  indicate  that  an  effect  or  difference  does  exist.  In  that 
case  not  only  is  a  significance  test  unnecessary  but  its  use  is  unwise.  An 
effect  known  to  exist  may  yet  be  too  small  to  be  revealed  by  a  significance 
test,  so  that  the  appeal  to  such  a  test  leads  to  an  apparent  logical  contra- 
diction. It  will  mean  the  replacing  of  a  decisive  rejection  of  the  hypothesis 
by  a  possibility  of  rejection,  depending  on  the  outcome  of  an  experiment. 
Actually  all  that  the  test  tells  us  is  that  the  data  provide  no  evidence  for 
the  existence  of  the  effect;  but  such  evidence,  even  if  found,  would  be 
redundant. 

The  evidence  which  a  significance  test  admits  is,  in  general,  of  a  less 
conclusive  kind  than  that  based  on  prior  considerations.  If  we  have  no 
prior  information  about  an  effect,  as  in  the  comparing  of  a  new  treatment 
for  the  common  cold  with  the  standard  treatment,  we  can  draw  conclusions 
only  from  a  significance  test.  The  hypothesis  that  there  is  no  difference 
between  the  treatments  is  tenable  and  may  even  be  supported  by  opinion ; 
therefore  any  evidence  against  it  must  be  derived  from  the  observations 
alone,  and  the  outcome  will  be  uncertain,  depending  as  it  does  on  the 


6  REGRESSION  ANALYSIS 

size  of  the  sample  used,  the  inherent  variability  of  the  material,  and  the 
tests  employed.  On  the  other  hand,  if,  for  example,  we  are  comparing 
different  species  of  timber,  we  know  that  they  differ  in  their  mechanical 
and  other  properties ;  it  may  happen  that,  for  two  given  species,  some  of 
these  differences  are  rather  small,  but  this  does  not  justify  our  concluding, 
on  the  basis  of  some  test  of  significance,  that  the  species  do  not  differ  in 
these  respects.  In  such  cases  what  is  of  interest  is  some  fiducial  range 
within  which  the  true  difference  may  be  expected  to  lie.  Sometimes, 
with  small  observed  differences,  this  range  for  the  true  difference  will 
include  zero,  but  this  fact  does  not  make  such  a  range  any  less  useful;  the 
range  still  shows  the  extent  to  which  the  true  difference  is  indeterminate. 

1.7    STATISTICAL  AND  PRACTICAL  SIGNIFICANCE 

A  distinction  readily  appreciated  by  the  experimenter  is  that  between 
the  significance  of  an  effect  as  determined  by  a  statistical  test  and  its 
practical  importance  when  its  existence  is  once  established.  For  example, 
it  may  be  that  nutrition  level  affects  the  prevalence  of  twinning  in  sheep, 
the  proportion  of  twin  births  being  raised  by  improved  nutrition  from  10 
to  12  per  cent.  The  significance  of  such  a  difference  could,  of  course, 
be  established  if  a  sufficiently  large  and  well-controlled  experiment  were 
undertaken ;  however,  in  view  of  the  wide  range  of  conditions  of  nutrition 
and  other  environmental  factors  in  animal  husbandry,  and  the  large  effect 
of  season,  producing  presumably  even  larger  effects  on  the  twinning  rate, 
this  difference  would  probably  have  no  practical  importance.  The  results 
of  the  significance  test  would  then  be  ignored. 

A  careful  experimenter  who  has  some  information  about  the  inherent 
variability  in  his  material  will  naturally  aim  to  design  his  experiments  so 
that  the  magnitude  of  differences  or  effects  judged  significant  will  be  what 
is  important  in  practice.  This  will  mean  that  all  the  effects  shown  up  by 
significance  tests  can  be  accepted,  and  that  none  of  the  information 
provided  by  the  experiment  is  wasted. 

1.8    INCORRECT  APPLICATIONS  OF  SIGNIFICANCE  TESTS 

In  order  to  throw  further  light  on  the  uses  of  significance  tests,  we  now 
discuss  some  examples  to  which  the  tests  are  incorrectly  applied,  owing  to 
neglect  of  the  principles  just  given.  Often  an  experiment  will  be  carried 
out  using  several  levels  of  a  quantitative  factor,  and  the  regression  of 
response  on  level  will  be  calculated.  If  the  regression  is  linear,  a  particular 
relationship  between  response  and  treatment  level  may  be  taken  to 
be  established.    Two  inappropriate  applications  are  sometimes  made. 
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First,  differences  among  individual  response  means  may  be  tested.  Now 
the  significance  of  the  regression  has  established  the  existence  of  a  linear 
relation  between  response  and  level ;  hence,  corresponding  to  any  difference 
in  level,  however  small,  there  will  exist  a  difference  in  response.  On  the 
other  hand,  the  difference  between  response  means  may  well  be  non- 
significant for  levels  sufficiently  closely  spaced.  A  supplementary 
significance  test  on  differences  of  means  is  therefore  illogical  and  can  be 
misleading.  Second,  calculations  are  sometimes  made  to  determine 
what  difference  in  treatment  level  is  sufficient  to  produce  a  "significant" 
difference  in  response,  or,  rather,  what  difference  in  level  can  be  tolerated 
without  affecting  the  response  significantly.  As  explained  earlier,  once  a 
significant  regression  has  been  established,  differences  in  response  will 
exist  between  different  levels,  so  that  the  "significance"  of  the  difference  is 
irrelevant.  Probably  what  is  required  in  such  an  analysis  is  the  least 
difference  in  level  that  will  produce  a  practically  important  difference  in 
response.  This  is  easily  found  as  the  ratio  of  the  required  difference  in 
response  to  the  regression  coefficient  on  treatment  level. 

Another  incorrect  application  is  to  adopt  a  hypothesis  because  a  test  of 
significance  shows  that  there  is  no  evidence  against  it.  A  typical  example 
is  in  the  fitting  of  a  regression  line  when  the  constant  term  turns  out  to  be 
small.  If  the  constant  term  is  found  not  to  be  significant,  the  test  shows 
that  a  line  through  the  origin  is  not  discordant  with  the  data,  but  it  does 
not  establish  that  this  is  the  appropriate  line.  It  is  not  correct  to  fit  a  line 
through  the  origin  unless  there  is  some  prior  reason  for  doing  so.  If,  on 
the  other  hand,  the  null  hypothesis  specifies  a  line  through  the  origin,  it  is 
appropriate  to  fit  such  a  line  unless  a  test  shows  that  the  hypothesis  is 
discordant  with  the  data.  When  the  line  is  assumed  to  pass  through  the 
origin,  the  restriction  on  the  variation  of  the  line  greatly  increases  the 
accuracy  of  determination  of  the  regression  coefficient ;  thus,  to  fit  a  line 
through  the  origin  merely  because  the  departure  is  nonsignificant  gives  a 
spurious  appearance  of  accuracy  to  the  regression. 

Similar,  although  not  so  common,  mistakes  arise  in  fitting  polynomial 
regressions,  using  either  successive  powers  or  orthogonal  polynomials  in 
the  independent  variable.  It  is  not  correct  to  omit  terms  merely  because 
their  contribution  is  nonsignificant,  unless  the  null  hypothesis  specifies 
that  the  corresponding  coefficients  vanish. 

These  considerations  show  that  significance  tests  have  important  but 
limited  applications  and  are  of  most  use  in  certain  critical  cases.  Subse- 
quent chapters  will  give  numerous  examples  of  their  application,  in  which 
we  shall  endeavour  to  show  their  uses  and  limitations.  These  examples 
should  make  clearer  the  principles  of  inference  outlined  in  the  brief  and 
rather  abstract  discussion  just  given. 
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1.9    FIDUCIAL  LIMITS  FOR  PARAMETERS 

The  null  hypothesis  will  often  specify,  among  other  things,  the  values  of 
certain  parameters ;  a  significance  test  will  indicate  whether  the  specified 
values  of  the  parameters  are  concordant  with  the  data.  In  relation  to 
any  given  parameter,  it  is  convenient  to  consider  a  set  of  null  hypotheses, 
differing  from  one  another  only  in  the  value  assigned  to  the  parameter. 
The  set  of  hypotheses  that  are  not  significant  at  the  chosen  level  of 
probability  P  (say  0.01)  will  correspond  in  general  to  a  range  of  values 
for  the  parameter,  which  will  be  called  the  1  —  P  (i.e.,  99  per  cent)  fiducial 
range;  the  limits  of  this  range  are  called  1  —  P  (i.e.,  99  per  cent)  fiducial 
limits. 

This  argument  shows  that,  corresponding  to  any  significance  test  of  the 
value  of  a  parameter,  there  will,  in  general,  be  a  fiducial  range  of  values ; 
this  provides  a  ready  means  of  developing  interval  estimates  for  the 
parameter. 

Simultaneous  fiducial  limits  for  two  or  more  parameters  may  be  defined 
in  a  similar  way.  Some  problems  in  the  interpretation  of  such  limits  will 
be  discussed  in  Chapter  6. 

1.10    MULTIPLE  COMPARISONS 

One  recent  development  has  been  the  construction  of  simultaneous 
tests  of  several  comparisons,  in  which  the  significance  level  is  adjusted  to 
make  allowance  for  the  fact  that  more  than  one  test  has  been  performed. 
The  previous  discussion  shows  that  such  tests,  although  they  have  achieved 
some  prominence,  will  only  rarely  be  needed. 

Thus,  in  comparing  the  means  for  a  number  of  different  treatments,  if 
an  over-all  F  test  is  significant,  this  establishes  the  existence  of  treatment 
differences.  If  more  detailed  comparisons  are  required,  individual 
treatments  or  combinations  of  treatments  may  be  compared.  However, 
the  significance  of  the  difference  between  two  treatments  is  generally  less 
important  than  a  knowledge  of  a  fiducial  range  for  their  difference — or, 
better  still,  a  simultaneous  fiducial  range  for  all  the  treatment  means,  as 
described  in  the  previous  section.  This  latter  has  been  given  by  Fisher  in 
his  The  Design  of  Experiments  (Section  64). 

However,  it  may  occasionally  be  of  interest  to  make  multiple  com- 
parisons among  several  means.  The  simplest  test  to  use  for  this  purpose 
is  one  proposed  by  Scheffe  (1953).  This  test  has  the  advantage  that  it 
admits  every  possible  comparison  on  an  equal  basis,  and  that  its  significance 
levels  are  readily  derived  from  the  significance  points  of  the  F  distribution. 

The  test  is  based  on  the  fact  that,  if  there  are  m  means  to  be  compared, 
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the  maximum  value  of  the  sum  of  squares  for  any  comparison  is  equal  to 
the  sum  of  squares  between  means,  with  m  —  1  degrees  of  freedom. 
Accordingly,  in  applying  Scheffe's  test,  we  divide  the  sum  of  squares  for 
the  comparison  of  interest  by  m  —  1  and  compare  it  with  the  error  mean 
square  by  means  of  the  F  test. 

This  test  is  clearly  rather  stringent  and  becomes  more  so  as  m  is  increased; 
the  lack  of  sensitivity  for  any  particular  comparison  is  the  price  that  has 
to  be  paid  for  the  possibility  of  testing  the  significance  of  any  comparison 
whatever,  without  regard  for  what  comparisons  are,  a  priori,  likely  to  be 
of  importance. 


CHAPTER    2 


Linear  Regression 


2.1    INTRODUCTORY 

The  linear  regression  relation,  or  straight-line  relation  between  two 
variables,  is  probably  familiar  to  all  who  have  occasion  to  consider 
relations  between  variables.  It  requires  only  the  most  elementary 
techniques  for  its  estimation.  Nevertheless,  it  will  be  considered  here  in 
sufficient  detail  to  bring  out  the  basic  principles  of  calculation  and 
estimation,  which  can  be  applied  to  more  complex  forms  of  relationship. 

For  the  purposes  of  this  chapter  we  shall  assume  that  n  pairs  of  values 
of  variables  x  and  y  have  been  observed,  and  that  we  are  prepared  to 
assume  that  there  exists  a  linear  regression  of  y  on  x;  that  is  to  say,  the 
expected  value  of  each  value  of  y  is  a  linear  function  of  the  corresponding 
value  of  x.    The  relationship  may  be  written 

EM  =  ^o  +  A*. 

The  two  coefficients  f$0  and  px  are  the  regression  coefficients,  although  ft0 
is  usually  termed  the  intercept  or  constant  term. 

2.2    AN  EXAMPLE 

Example  2.1  Relation  between  Sodium  Concentration  and  Flame 
Photometer  Reading.  For  example,  we  may  consider  some  data  provided 
by  the  calibration  of  a  flame  photometer;  these  are  given  in  Table  2.1.  The 
photometer  is  designed  to  register  the  sodium  concentration  in  a  sample  of 
material  as  a  scale  reading,  the  reading  being  roughly  proportional  to  the 
sodium  concentration.  The  samples  were  made  up  with  preassigned  sodium 
contents  and  the  readings  for  each  observed.  In  this  instance,  the  scale  reading 
is  dependent  on  sodium  concentration,  so  that  the  regression  of  scale  reading 
on  sodium  concentration  is  appropriate.  Inspection  of  the  table  shows  that  the 
constant  of  proportionality  for  the  photometer  is  roughly  0.4.  The  calculations 
beneath  the  table  (which  will  be  discussed  later)  show  that  the  regression 
coefficient  bx  estimated  from  these  data  is  0.416  and  that  the  constant  term  is 
estimated  as  —0.89.     Hence  the  regression  equation  is 

Y  =  -0.89  +  0.416a;. 
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This  equation  makes  it  possible  to  predict  the  scale  reading  corresponding  to 
a  given  sodium  concentration.  If  a  scale  reading  is  given  and  the  sodium 
concentration  is  to  be  estimated,  this  equation  may  be  transformed  to  give 

X  =  (y  +  0.89)/0.416 
=  2.14  +  2.404?/. 

In  this  equation  X  has  been  employed  to  denote  an  estimated  value.  The 
equation  is  not  the  same  as  would  have  been  obtained  by  calculating  the 
regression  of  x  on  y;  as  x  is  a  variable  taking  fixed  values,  however,  it  cannot 
have  a  regression  on  y,  so  that  such  a  calculation  would  not  be  valid. 


TABLE  2.1 

Sodium  Concentrations  and  Corresponding  Scale  Readings 

of  a  Flame  Photometer 

Sodium  Concentration,  x, 
milliequivalents/liter  Scale  Reading,  y 


25 

10.0 

50 

20.0 

75 

29.5 

100 

39.5 

125 

52.0 

150 

62.0 

175 

72.0 

200 

83.5 

225 

91.5 

Total 

1,125 

460.0 

Mean 

125 

51.11 

Sums  of  squares 

37,500 

6,495.889 

Sum  of 

products 

15,600 

Regression  coefficient 

= 

15,600/37,500  =  0.416 

Constant  term 

= 

51.11 

-  0.416  x  125  =  -0.89 

Sum  of 

squares  attributable  to 

regression  = 

15,6002/37,500  =  6,489.600 

2.3    DEPENDENT  AND  INDEPENDENT  VARIABLES 

In  considering  the  relation  between  two  physical  quantities,  it  is  usual 
to  think  of  one  variable  as  being  the  causal  variable  and  .to  describe  it  as 
the  independent  variable,  the  other  variable  being  dependent  on  it.  In 
statistical  analysis  the  same  terms  are  used,  but  not  in  exactly  the  same 
sense.  The  dependent  variable  is  the  one  whose  values  are  distributed 
at  random  about  the  regression  function,  so  that  its  expected  value  is 
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some  function  of  the  observed  value  of  the  independent  variable.  The 
values  of  the  independent  var  iable  need  not  be  randomly  distributed  but 
may  have  been  fixed  or  selected  in  any  manner. 

In  the  statistical  sense,  therefore,  there  is  no  implication  that  the 
independent  variable  is  causal.  Very  often,  however,  values  of  the 
causal  variable  will  be  selected,  so  that  the  statistical  and  the  physical 
definitions  will  agree.  When  sampling  is  random  with  respect  to  both 
variables,  either  variable  may  be  regarded  as  independent  as  the  one  or 
the  other  is  used  to  predict  from. 

2.4    ESTIMATION  OF  THE  REGRESSION 
RELATIONSHIP 

If  the  distribution  of  values  of  y  about  Ex{y)  is  normal,  it  is  known  that 
the  method  of  least  squares  is  the  efficient  means  of  estimating  the  constants 
P0  and  plm  The  estimates,  denoted  b0  and  bl3  are  then  sufficient  statistics 
for  p0  and  pv  Whether  or  not  the  distribution  is  normal,  the  method  of 
least  squares  gives  estimates  whose  distribution  tends  to  normality  the 
larger  the  size  of  the  sample  on  which  they  are  based.  Consequently, 
the  least-squares  method  of  estimation  will  be  used  and  it  will  be  assumed 
that  the  estimates  are  distributed  as  though  the  underlying  distribution  of 
errors  in  the  original  data  were  normal. 

The  sum  of  squares  to  be  minimized,  with  respect  to  the  values  po  and 
Pi,  is 

S(y  -Po-  M2, 

where  S  denotes  summation  over  the  members  of  the  sample.  The 
minimization  is  usually  carried  out  by  means  of  the  differential  calculus, 
but  it  is  worth  notirg  that  it  can  be  performed  by  elementary  algebra,  in  a 
way  which  gives  simultaneously  the  estimates  of  the  two  constants  and 
the  minimized  sum  of  squares.  This  method  of  minimizing  is,  moreover, 
not  confined  to  simple  regression  but  may  be  extended  to  multiple 
regression  equations.    The  sum  of  squares  is 

Sy2  -  2p0Sy  -  2pxSxy  +  np2  +  2popxSx  +  0*Sa* 

=  Sy2  -  (Syf/n  -  ip^Sxy  -  SxSyjn)  +  fiflS*  -  (Sxfjn] 

+  n(p0  -y  +  pxxf 

=  S(y  -  yf  -  [Sy(x  -  x)f/S(x  -  xf  +  n(p0  -y  +  pxxf 

+  S(x  -  xf[px  -  Sy(x  -  x)/S(x  -  xf]\ 

where  x  =  Sx/n,  y  —  Syjn. 

From  this  partition  of  the  sum  of  squares  we  see  at  once  that :  (i)  Since 
the  third  and  fourth  terms,  which  alone  depend  on  p0  and  pv  are  perfect 
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squares  whose  minimum  is  consequently  zero,  the  first  two  terms  give 
the  sum  of  squares  of  departures,  minimized  with  respect  to  the  values  of 
the  two  parameters. 

(ii)  The  third  and  fourth  terms  define  the  minimizing  values  of  ft  and 
ft,  which  are  found  to  be 


^nd 


ft  =  Sy(x  -  x)jS(x  -  xf 
b0  =  y  -  bxx. 


(iii)  The  sum  of  squares  accounted  for  by  the  regression  on  x,  that  is, 
the  sum  of  squares  indicating  the  departure  of  ft  from  zero,  is  obtained 
from  the  final  term  with  ft  put  equal  to  zero;  in  general,  the  final  term, 
with  any  assigned  value  of  ft,  gives  the  sum  of  squares  for  the  departure 
of  ft  from  the  hypothetical  value  ft. 

2.5    TESTING  SIGNIFICANCE  OF  THE  REGRESSION 

This  simple  procedure  thus  gives  the  required  estimates  and  opens  the 
way  for  all  the  tests  of  significance  based  on  the  analysis  of  variance.  If 
we  write 

(n  -  2)s2  =  S(y  -  yf  -  [Sy(x  -  x)f/s(x  -  xf, 

then  s2  is  an  estimate  of  the  residual  variance  of  y,  based  on  n  —  2  degrees 
of  freedom.  The  analysis  of  variance  then  takes  the  form  shown  in 
Table  2.2. 


TABLE  2.2 

D.F. 

Sum  of  Squares 

Mean  Square 

Regression 
Residual 

1 
w.-2 

S(y 

[Sy(x  -  x)f/S(x  -  xf 
=  b^yix  —  x) 
=  b±2S(x  -  xf 

-  yf  -  [Sy(x  -  x)?IS(x 

=  (n  -  2)s2 

-xf 

bxSy{x  —  x) 
s2 

Total 


n  -  1 


S(y  -  yf 


The  analysis  of  variance  for  the  example  is  given  in  Table  2.3.  This 
analysis  provides  the  test  for  the  significance  of  ft.  Alternatively,  the 
derivation  shows  that  the  standard  error  of  ft  is 


J 


.S(x  -  xf. 
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Fiducial  limits  for  the  value  of  ft  may  be  derived  using  the  value  of  bx 
and  its  standard  error. 


TABLE  2.3 
Analysis  of  Variance  of  Photometer  Readings 

D.F.  Sum  of  Squares       Mean  Square 

Regression  on  sodium 

concentration  1  6489.600  6489.6 

Residual  7  6.289  0.8984 


Total  8  6495.889 

If  t  is  the  1  per  cent  point  of  the  /  distribution  with  n  —  2  degrees  of 
freedom,  the  99  per  cent  fiducial  limits  for  ft  are  given  as 

ts 


^[S{x-xf]' 

For  the  example,  the  standard  error  of  bx  =  0.0049  and  t  =  3.499  so  that 
the  fiducial  limits  are 

0.416  ±  3.499  x  0.0049 
=  0.399  and  0.433. 

2.6    TEST  FOR  THE  INTERCEPT 

One  other  aspect  of  the  regression  equation  is  often  worth  examining. 
In  the  example  of  calibration  of  a  photometer  it  is  reasonable  to  assume 
that  the  readings  will  be  proportional  to  the  actual  sodium  concentration. 
On  this  basis  the  regression  equation  would  be 

EM  =  A*; 

in  other  words,  the  constant  ft  giving  the  intercept  on  the  Y-axis  would  be 
zero.  For  a  test  of  the  concordance  of  the  data  with  this  assumption, 
we  need  a  different  partition  of  the  residual  sum  of  squares  from  that 
given  before.    The  sum  of  squares  involving  ft  and  ft  is 

*(Po  -V  +  M2  +  S(x  -  <r)2[ft  -  Sy(x  -  x)/S(x  -  xff 
which  may  be  rewritten  as 

Sx2 


Syx      p0Sx 
Sx*        Sx2  . 


nS(x  -  xf 


Sx* 


h  -  h  + 


Sx2 

(ftp  -  b0)Sx 
Sx2 


SySx2  SxSyx 


nS(x  —  x)2       nS(x  —  x)2_ 


2  MCY~  _  ^\2 


nS(x  -  xy 
+ ^ (ft  -  b0)2. 
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The  second  term  of  this  expression  provides  the  criterion  for  testing  the 
significance  of  any  assigned  value  of  /?0;  in  particular,  if  it  is  assumed 
that  ^0  =  0,  this  term  becomes 

(SySx2  -  SxSyxf  _  nb02S(x  -  xf 
nSx2S{x  -  x)2     ~~  Sx2  ' 

This  sum  of  squares  may  be  tested  against  s2  in  the  usual  way;  alterna- 
tively, the  expression  shows  that  the  standard  error  of  b0  is  estimated  as 

s2Sx2 


J 


lnS(x  -  xf_ 

Fiducial  limits  for  /30  may  be  determined  using  the  value  of  b0  and  its 
standard  error ;  a  nonzero  value  would  indicate  that  the  instrument  had  a 
"zero  error,"  for  which  an  adjustment  would  have  to  be  made. 

For  the  photometer  example,  we  have  b0  =  —0.89.  Such  a  negative  value 
would  not  be  a  possible  scale  reading.  We  also  have  Sx2  =  178,125,  so  that 
the  standard  error  of  b0  is 


//0.8984  x  178,125\  = 
V  \      9  x  37,500      /        v 


=  0.69. 

The  ratio  of  b0  to  its  standard  error  is  —0.89/0.69  =  —1.29,  which  is  not 
significant.  The  assumption  that  the  line  passes  through  the  origin  is  therefore 
acceptable. 

To  fit  a  line  through  the  origin,  the  sum  of  squares  to  be  minimized  is 
simply 

S(y  -  M2, 
whence  the  estimate,  denoted  b{  to  distinguish  it  from  the  earlier  estimate, 
is  found  to  be 

b{  =  Syx/Sx2, 
with  variance 

s2ISx2. 
In  the  present  example, 

V  =  0.4104 
with  variance 

0.8984/178,125  =  0.000  005  044 

so  that  the  standard  error  of  V  equals  0.00225. 

It  is  noted  that  the  standard  error  of  the  slope  of  the  line  through  the 
origin  is  much  less  than  that  of  bx.  This  is  generally  true;  the  reason  is 
that  the  assumption  that  the  line  passes  through  a  given  point  contributes 
information  which  was  not  made  use  of  in  determining  bx\  or,  put  in 
another  way,  the  slope  is  restricted  in  its  variation  by  the  condition  that 
the  line  passes  through  the  origin. 


16 


REGRESSION  ANALYSIS 


On  the  assumption  that  the  line  passes  through  the  origin,  the  99  per 
cent  fiducial  limits  for  the  regression  coefficient  are 

0.4104  ±  3.499  x  0.00225 
=  0.4025    and    0.4183. 

2.7    LINEAR  REGRESSION  WITH  GROUPED  DATA 

In  many  experiments  a  number  of  values  of  the  dependent  variable  will 
be  observed  corresponding  to  each  value  of  the  independent  variable. 
For  example,  an  experiment  may.  be  repeated  on  a  number  of  days,  to 
improve  the  precision  of  the  estimates  for  each  value  of  the  independent 
variable.  Again,  it  is  sometimes  convenient  with  large  bodies  of  data  to 
group  the  values  of  the  independent  variable  into  classes ;   the  values  of 


TABLE  2.4 

D.F. 

Sum  of  Squares 

Between  groups,  regression 

1 

[Sn, 

r 

yr(xr  -  x)f/Snr(xr 

r 

-xf 

deviation  from  regression 

m  —  2 

by  subtraction 

Total  between  groups 

m  —  \ 

Snr(yr  -  yf 

r 

Within  groups 

n  —  m 

by  subtraction 

Total 


S(y  -  yf 


the  independent  variable  will  be  replaced  by  the  median  values  of  the 
classes  in  which  they  fall,  and  to  each  class  will  correspond  a  varying 
number  of  values  of  the  dependent  variable. 

When  the  values  are  repeated  in  this  way,  the  sum  of  squares  of  the 
values  of  the  dependent  variable  may  be  analyzed  into  parts  corresponding 
to  variation  between  groups  and  variation  within  groups,  quite  apart 
from  any  consideration  of  regression :  if  there  are  m  groups,  and  nr  values 
in  the  rth  group,  whose  mean  is  yr,  the  now-familiar  analysis  is 

S(y  -  yf  =  Snr(yT  -  yf  +  SS(y  -  yrf, 


the  corresponding  partition  of  degrees  of  freedom  being 


(n-\) 

total 


(m-  1) 

between  groups 


(n  —  m) 

within  groups 


The  variation  between  groups  can  be  further  analyzed  into  that  part 
attributable  to  regression  on  the  independent  variable  and  the  residual 
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not  so  attributable;  we  then  have  the  analysis  of  variance  shown  in  Table 
2.4,  in  which  xr  is  the  value  of  the  independent  variable  for  the  rth  group. 
The  value  of  this  form  of  analysis,  rather  than  the  simple  analysis  into 
regression  and  residual,  is  that  it  enables  us  to  test  the  adequacy  of  the 
linear  regression  relation  to  represent  the  data.  If  the  mean  square  for 
deviation  from  regression  is  significantly  greater  than  that  within  groups, 
the  hypothesis  that  the  relation  between  the  variables  is  linear  must  be 
rejected.  It  is  then  immaterial  whether  the  regression  is  significant  or 
not,  since  it  does  not  satisfactorily  represent  the  data.  What  alternative 
relationships  may  be  fitted  when  a  linear  one  is  not  satisfactory  will  be 
discussed  in  Chapters  3  and  4.  The  test  for  deviation  from  linearity 
will  now  be  applied  to  a  numerical  example. 

Example  2.2     Effect  of  Pressure  in  Sheetmaking  on   Tear  Factor. 

In  a  laboratory  experiment,  which  was  repeated  on  four  days,  paper  was  made 
under  five  different  pressures  during  sheet  pressing.  Various  properties  of  the 
resulting  sheets  were  recorded,  including  tear  factor,  whose  values  are  given  in 
Table  2.5.  Since  the  variation  between  days  was  not  significant,  it  has  been 
ignored  for  this  example. 

TABLE  2.5 
Values  of  Tear  Factor  at  Different  Pressures 

/  pressure  \ 


Pressure 

b2\  7i   ; 

Tear  Factor, 

y 

Total 

Mean 

35 

-2 

112     119     117 

113 

461 

115.25 

50 

-1 

108      99     112 

118 

437 

109.25 

71 

0 

120     106     102 

109 

437 

109.25 

100 

1 

110     101       99 

104 

414 

103.50 

141 

2 

100     102      96 

101 

399 

99.75 

The  pressures  were  chosen  in  geometric  progression  (with  ratio  V2)  because 
it  was  expected  that  the  properties  would  be  approximately  related  to  the 
logarithm  of  pressure;  accordingly,  in  the  second  column  of  Table  2.5  are 
tabulated  values  of  x,  the  logarithms  of  pressure  to  the  base  V2,  reduced  so  as 
to  have  mean  zero;  the  regression  of  tear  factor  on  these  logarithmic  values 
was  determined: 

Sum  of  products  =  -2  x  461  -  1  x  437  +  0  x  437  +  1  x  414  +  2  x  399 
=  -147 
Sx2  =  40 

Regression  sum  of  squares 

=  (-147)2/40 
=  540.22. 
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The  analysis  of  variance  given  in  Table  2.6  confirms  the  existence  of  the 
linear  relationship  and  moreover  shows  that  the  pressure  means  do  not  depart 
from  the  regression  relationship  more  than  may  be  attributed  to  random  error. 


TABLE  2.6 
Analysis  of  Variance  of  Tear  Factor  Values 


D.F.       Sum  of  Squares       Mean  Square 


** 


Regression  on  log  pressure  1  540.22  540.22 

Deviation  from  regression  3  28.58  9.53(n) 

Within  pressures  15  492.00  32.80 


Total  19  1060.80 


(n)  Not  significant 

**  Significant  at  1  per  cent  level 


2.8    TRANSFORMATION  OF  DATA  BEFORE  ANALYSIS 

The  regression  analysis  previously  described  may  be  validly  performed 
provided  (i)  there  is  a  linear  relationship  between  the  expected  value  of  y 
and  the  value  of  x,  and  (ii)  the  variance  of  y  about  its  expected  value  is  the 
same  for  all  x;  in  the  two  examples  given  these  conditions  were  satisfied 
for  all  practical  purposes.  Fortunately,  even  when  these  conditions  do 
not  hold,  it  is  very  often  possible  to  modify  the  original  variables  in  some 
way  so  that  the  new  variables  satisfy  the  conditions.  In  Example  2.2,  for 
instance,  transforming  the  pressure  to  logarithms  gave  a  satisfactory 
linear  regression  equation.  (The  fact  that,  in  this  particular  case,  the 
regression  on  untransformed  pressure  does  not  depart  significantly  from 
linearity,  since  the  errors  are  high,  does  not  affect  the  argument.)  Very 
often,  some  transformation  of  one  or  the  other  variable  will  bring 
the  relationship  to  linear  form;  the  transformations  most  commonly 
employed  are  the  logarithmic,  the  square  root,  and  the  reciprocal.  Many 
examples  may  be  found  in  the  literature,  so  that  it  is  not  necessary  to  give 
examples  here. 

An  example  of  a  rather  complicated  relationship  which  may  be  reduced 
to  linear  form  is  Hankinson's  formula  for  the  failing  stress  in  timber  at 
various  angles  to  the  grain  direction;  this  is 

r Jcjc 

J6  — 


fc  sin2  d+fc'  cos2  6 
where/,  is  the  stress  parallel  to  the  grain,//  is  the  stress  perpendicular  to 
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the  grain,  and  fd  is  the  stress  at  an  angle  6  to  the  grain  direction.    This 
formula  may  be  written 

1  _  cos2  6      sin2  d 
fd  fc  fc 


=i+.«H 


1 

Jc  \/c  Jc1 


so  that,  if  values  of/0  are  observed  for  various  values  of  d,fc  and/c'  may 
be  estimated  from  the  constants  of  the  linear  regression  of  \jfQ  on  sin2  6. 
If  we  put  y  =  \jfd  and  x  =  sin2  6  and  the  regression  equation  is 

Y  =  b0  +  bjx, 

the  estimates  of/c  and//  are  l/60  and  1/(6  0  +  bj  respectively. 

The  second  condition,  homogeneity  of  variance,  is  approximately 
satisfied  for  most  sets  of  data  in  which  the  over-all  variation  of  results  is 
small  compared  with  the  error  variance.  If  this  condition  does  not  hold, 
each  value  of  the  dependent  variable  must  be  weighted  inversely  as  its 
variance  for  an  efficient  estimate  of  the  regression  coefficient  to  be 
obtained.  The  unweighted  regression  coefficient  will  still  be  unbiased, 
but  it  will  be  less  accurately  determined  than  the  weighted  coefficient. 
For  this  reason  an  efficient  analysis  is  not  possible  unless  the  weights  are 
known  or  can  be  estimated.  If  the  weight  of  the  values  of  y  in  the  rth 
group  is  wr,  the  weighted  regression  coefficient  is 

blw  =  Snrwryr{xr  -  x)lSnrwr(xr  ~  ^)2> 

r  r 

and  the  sum  of  squares  due  to  regression  is 

blwSnrwryr(xr  -  x). 

Sometimes  the  weights  will  be  given  independently  of  the  values  of  x, 
but  more  often  they  are  functions  of  x.  For  example,  the  variance  of  y 
is  often  proportional  to  the  square  of  its  expected  value,  that  is,  to 
(/?0  +  Pix)2,  so  that  the  appropriate  weight  is  (/?0  +  ft^)-2. 

When  the  variance  of  y  is  a  function  of  its  expected  value,  determination 
of  the  regression  coefficient  requires  iterative  methods,  which  will  be 
discussed  in  Chapter  4.  The  method  is  to  determine,  by  an  unweighted 
analysis  or  some  other  approximate  method,  estimates  of  /?0  and  ($lf  and 
hence  of  the  weights  corresponding  to  each  value  of  x.  The  analysis 
using  these  weights  gives  improved  estimates,  which  by  repeated  appli- 
cation may  be  determined  to  any  desired  degree  of  accuracy.    Usually 
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two  iterations  are  sufficient  to  reduce  the  error  of  calculation  to  a 
smaller  order  than  the  standard  error  of  the  estimates. 

A  simple  case,  which  may  be  dealt  with  by  the  methods  of  this  chapter, 
arises  when  the  variance  is  known  to  be  proportional  to  x2.  Then  yjx 
has  constant  variance.     Since  the  regression  equation  can  be  written 

Yjx  =  b1  +  bjx, 

the  regression  coefficient  and  constant  term  may  be  estimated  from  the 
regression  of  yjx  on  l/x.     This  case  is  illustrated  in  the  following  example. 

Example  2.3  Fitting  the  Hankinson  Formula  to  Experimental 
Data.  As  explained  in  Section  2.8,  the  Hankinson  formula  for  the  failing 
stress  of  timber  at  various  angles  to  the  direction  of  grain  may  be  expressed  as  a 
linear  relationship  between  the  reciprocal  of  the  failing  stress  and  sin2  6.    A  set 


TABLE  2.7 
Relationship  between  Modulus  of  Rupture  and  Angle  of  Grain 


Angle 

Modulus  of 

106/M.  ofR., 

1/(0.1  +  sin2 

60,     2//(0. 

1  +  sin2  d), 

o,° 

Rupture 

y 

sin20 

X 

z 

0 

16,880 

59.24 

0.0000 

10.00 

592 

2.5 

14,720 

67.93 

0.0019 

9.81 

667 

5 

14,340 

69.74 

0.0076 

9.29 

648 

7.5 

12,740 

78.49 

0.0170 

8.55 

671 

10 

12,390 

80.71 

0.0302 

7.68 

620 

15 

7,140 

140.1 

0.0670 

5.99 

839 

20 

7,170 

139.5 

0.1170 

4.61 

643 

30 

4,710 

212.3 

0.2500 

2.86 

607 

45 

2,280 

438.6 

0.5000 

1.67 

731 

60 

1,720 

581.4 

0.7500 

1.18 

684 

90 

970 

1,031 

1.0000 

0.91 

937 

Total 

62.55 

7,639 

Sums  of 

squares 

129.2643 

110,965 

Sum  of 

Droducts 

-1,849.87 

of  experimental  determinations  of  modulus  of  rupture  at  various  angles  to  the 
grain  direction  is  set  out  in  Table  2.7.  It  was  considered  that  the  variance  of 
the  reciprocals  would  be  approximately  proportional  to  (0. 1  +  sin2  0)2.  Hence 
the  weighted  regression  of  106  times  the  reciprocal  of  modulus  of  rupture  y  on 
0.1  +  sin2  6  was  determined.  As  explained  earlier,  this  is  equivalent  to 
determining  the  regression  of 

z  =  y  1(0.1  +  sin20) 
on  x  =  1/(0.1  +sin20). 

Accordingly,  values  of  y,  z,  and  x  are  also  given  in  Table  2.7. 


LINEAR  REGRESSION  21 

The  regression  coefficient  of  z  on  x,  which  is  designated  b0,  is 
-1849.87/129.2643  =  -14.31. 

The  constant  term  is 

b1  =  z  —  b0x 

=  775.83. 

Hence  the  weighted  regression  equation  is 

Z  =  -14.31cc  +  775.83. 

Expressed  in  terms  of  y  and  sin2  6,  the  equation  is 

Y  =  -14.31  +  775.83(0.1  +  sin2  d) 
or  Y  =      63.27  +  775.83  sin2  0. 

As  shown  in  Section  2.8,  estimates  of/c  and//  may  be  determined  from  this 
equation: 

fc  =  106/63.27  =  15,800 

fc'  =  106/(63.27  +  775.83)  =    1190. 

The  analysis  of  variance,  giving  the  residual  mean  square,  is  set  out  in  Table 
2.8.    Note  that,  in  this  analysis,  the  significance  of  the  regression  on  x  is  not 

TABLE  2.8 
Analysis  of  Variance  of  Weighted  Values,  z 

D.F.  Sum  of  Squares       Mean  Square 

Regression  on  x  1  26,473  26,470 

Residual  9  84,492  9,388 


Total  10  110,965 


relevant,  since  the  regression  coefficient  is  related  to  the  constant  term  in  the 
regression  of  y  on  sin2  d. 
The  standard  errors  of  the  estimates  are  found  as  follows.     Since 


106//c  =b0+  OAb, 

=  b0(l  -  OAx)  +  0.12, 


then 

K(106//c)  =  9388 

"(1  -  0.5686)2      0.01" 
L     129.2643       +    11  _ 

=  22.05, 

and 

S.E.  (106//c)  =    4.70. 

Similarly, 

106/// =  60  +  1.16l5 

and 

S.E.  (106 If/)  =55.1. 
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2.9    ALLOCATION  OF  VALUES  OF  THE 
INDEPENDENT  VARIABLE 

The  formula  for  the  standard  error  of  the  regression  coefficient  b± 
shows  that  the  precision  of  the  estimate  increases  with  S(x  —  xf;  the 
greater  the  spread  of  values  of  the  independent  variable,  and  the  greater 
their  number,  the  more  accurate  the  regression  coefficient  will  be.  When 
it  is  possible  to  select  values  of  the  independent  variable,  it  will  be  desirable 
to  spread  them  as  much  as  possible ;  theoretically  the  greatest  spread  will 
be  obtained  when  the  values  are  selected  at  the  limits  of  the  permissible 
range  of  the  independent  variable,  half  the  values  being  at  each  limit. 
There  are,  however,  practical  objections  to  this  procedure.  If  only  two 
values  of  the  independent  variable  are  sampled,  it  is  not  possible  to  tell 
whether  the  regression  is  in  fact  linear  over  the  range.  For  this  reason  it 
is  preferable  to  allocate  the  values  at  several  points  throughout  the  range. 
For  computational  convenience  it  is  often  desirable  to  space  the  values 
equally. 

These  remarks  may  be  summarized  by  saying  that  the  optimum 
allocation  depends  on  the  use  to  which  the  data  are  to  be  put.  If  the 
regression  is  known  to  be  linear,  allocation  at  the  ends  of  the  range  is 
optimum ;  if  information  about  the  extent  of  departure  from  linearity  is 
wanted,  some  method  of  spacing  through  the  range  will  be  optimum. 
However,  it  is  seldom  that  the  experimenter  can  specify  in  advance  exactly 
what  information  he  seeks  from  the  data,  so  that  it  is  not  possible  to  define 
optimum  allocation  precisely.  Usually  it  is  most  convenient  to  take 
points  equally  spaced  in  the  range  considered  and  allocate  an  equal 
number  of  values  to  each. 


CHAPTER    3 


Multiple  and  Polynomial  Regression 


3.1     GENERAL 

It  is  a  natural  extension  of  the  idea  of  simple  linear  regression  to  consider 
the  regression  of  one  variable  on  several  independent  variables.  The 
need  for  such  a  multiple  regression  relation  may  arise  either  from 
theoretical  considerations,  from  the  fact  that  the  relation  with  any  one 
independent  variable  does  not  give  a  high  enough  correlation  to  be  of 
much  value,  or  because  the  additional  variables  contribute  substantially 
to  an  already  high  correlation.  For  example,  it  is  known  that  the  strength 
of  timber  depends  on  its  density  and  its  moisture  content,  so  that  in 
studying  the  mechanical  properties  of  the  material  a  regression  relationship 
with  these  two  variables  would  be  sought.  For  most  practical  purposes  a 
linear  regression  relation  will  be  preferred. 

In  general,  therefore,  with  a  dependent  variable  y  and  p  independent 
variables  xl9  %2,---,  xv,  we  shall  seek  a  relationship  of  the  form 

EM  =  P0  +  P&i  +  £2*2  +  '  *  *  +  P**v 

Before  setting  out  to  determine  a  multiple  regression  equation,  it  is 
worthwhile  to  give  some  thought  to  the  selection  of  independent  variables. 
First  of  all,  it  is  worthwhile  to  include  only  those  variables  that  are  thought 
likely  to  make  an  important  contribution  to  the  effectiveness  of  the 
relationship.  Secondly,  independent  variables  that  are  readily  measurable 
or  observable  should  be  selected,  both  so  that  they  can  be  used  in  deriving 
the  estimated  relationship  and  also,  since  the  relationship  may  be  required 
for  later  use  in  prediction,  so  that  values  can  be  determined  for  this 
purpose. 

It  is  undesirable  to  include  too  many  variables,  in  the  regression  equation, 
first,  because  three  or  four  variables  if  suitably  chosen  will  generally 
provide  a  satisfactory  relationship,  second,  because  the  work  of  calculation 
increases  rapidly  with  the  number  of  variables,  and  third,  because  an 
equation  with  many  variables  in  it  can  seldom  be  easily  applied  in 
subsequent  prediction. 

23 
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3.2    ESTIMATION  OF  MULTIPLE  REGRESSION 

The  same  principles  of  estimation  and  testing  of  significance  apply  with 
multiple  regression  as  apply  with  simple  regression.  If  we  have  a  sample 
of  n  sets  of  values  of  the  one  dependent  and  the  p  independent  variables, 
the  sum  of  squares  to  be  minimized  is 

s(y  -  A  -  P&  -  •  •  •  -  pvxvf. 

We  introduce  at  this  stage  the  notation 
u  =  S(y  -  yf 
Pi  =  Sy(Xi  -  4) 
ta  =  S(x,  -  xtf 

hi  =  $\xh  ~  %h)\xi  ~  xi)- 

The  sum  of  squares  is  then 

«  ~  I  PiPi  +  I  PiiPita  +  foi2  +  •  '  '  +  PPtiP  ~  Pi\  (3.1) 

i  i 

Then  the  normal  equations  found  by  differentiating  (3.1)  and  replacing 
the  Pi  by  their  estimates  bt  take  the  form 

b1til  +  b2ti2  +  •  •  •  +  bvtiv  =  pi  (/  =  1,  2,  •  •  ;p), 

or  Ibhtih=Pi.  (3.2) 

h 

The  minimum  value  of  the  sum  of  squares  of  residuals,  obtained  when  the 
constants  pt  are  given  the  values  b{,  is 

u  ~  2  biPi- 

i 

This  sum  of  squares,  with  n  —  p  —  1  degrees  of  freedom,  provides  an 
estimate  of  the  residual  variance  of  departures  from  the  regression  equation 
and  will  accordingly  be  equated  to  (n  —  p  —  l)s2.  The  sum  of  squares 
attributable  to  regression  is 

2  biPi, 

i 

with  p  degrees  of  freedom. 
The  analysis  of  variance  takes  the  form 


D.F. 

Sum  of  Squares 

Mean  Square 

Regression 

P 

■%biPi 

2  bipjp 

Residual 

n  —  p  —  1 

u 

~2V<  =("  -  p  - 

i 

iy 

s* 

Total  n  -  1 


MULTIPLE   AND   POLYNOMIAL  REGRESSION  25 

which  provides  a  test  of  the  over-all  significance  of  the  regression.  Usually 
the  over-all  significance  will  not  be  in  doubt,  but  more  detailed  tests  of 
the  individual  regression  coefficients  will  be  required,  in  order  to  assess 
the  reality  of  the  contribution  of  each  variable.  The  ratio  of  the  regression 
sum  of  squares  to  the  total  sum  of  squares  (i.e.,  the  fraction  of  the  total 
sum  of  squares  accounted  for  by  the  regression)  is  sometimes  called  the 
coefficient  of  determination  and  denoted  by  R2.  Its  positive  square  root 
R  is  known  as  the  coefficient  of  multiple  correlation. 

So  far  nothing  has  been  said  about  the  question  of  actually  calculating 
the  bt.  This  question,  and  the  determination  of  the  standard  errors,  can 
be  dealt  with  together.  The  equations  (3.2)  may  be  solved  expeditiously 
by  determining  the  reciprocal  (inverse)  of  the  matrix  T(=  (thi)).  Various 
methods  are  advocated  in  the  literature.  In  Example  3.1  we  employ 
the  method  of  Crout  (1941),  although  other  methods  may  be  equally 
speedy.  It  is  well,  however,  to  standardize  on  one  particular  method 
unless  the  matrix  to  be  inverted  has  special  features  that  render  it  parti- 
cularly amenable  to  some  special  method. 

The  inverse  of  the  matrix  T  will  be  denoted  by  T~x  and  its  elements  by 
tM.    Then  the  bt  are  given  by  the  equation 

b,  =  2/VM- 

h 

It  is  readily  shown  that  the  variance  of  bt  is 

s2tii 

and  that  the  covariance  of  any  two  coefficients,  bh  and  bt,  is 

s2tM. 

An  immediate  test  may  therefore  be  made  of  the  significance  of  each 
regression  coefficient,  as  soon  as  s2  has  been  determined  from  the  analysis 
of  variance.  Also,  if  the  variables  xh  and  xi  are  of  the  same  kind,  and  it  is 
considered  likely  that  the  regression  coefficients  bh  and  bt  will  not  differ 
significantly,  their  difference  may  be  tested  against  its  standard  error 

s\/(thh  -  2thi  +  tu). 

If  it  is  valid  to  assume  that  the  two  regression  coefficients  are  equal  and  if 
the  significance  test  does  not  contradict  this  assumption,  an  average 
regression  coefficient  may  be  determined,  as 

bh(tu  -  tM)  +  bi(thh  -  tu) 

fhh  2.thi  -4-  tii 

with  variance 

s2\thhtu  (thi)2] 

thh  _  2thi  _j_  tii    ' 
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If  such  an  average  regression  coefficient  were  used,  the  sum  of  squares 
for  regression  would  be  reduced  by  the  amount  accounted  for  by  the 
difference,  namely 

(bh  -  bMthh  -  !tu  +  r). 


TABLE  3.1 

Records  of  Ballarat  Dew  Retting  Trials  1942-1943,  on 

Variety  Liral  Crown 


Mean  Daily 

Rainfall  during 

Retting  Period,  xl9 

0.01  in. 


Retting  Period, 
days 


Mean  Maximum 
Daily  tempera- 
ture during 
Retting  Period, 
*s.  °F. 


Ret  Loss,  y, 
per  cent 


4.3 

62 

78 

17.3 

15.8 

4.5 

68 

78 

17.6 

18.9 

4.3 

74 

78 

15.0 

15.8 

6.1 

71 

78 

18.1 

17.6 

5.6 

78 

78 

18.7 

18.7 

5.6 

85 

77 

17.9 

19.2 

6.1 

69 

76 

18.4 

16.7 

5.5 

76 

76 

18.1 

17.5 

5.0 

83 

76 

16.3 

19.1 

5.6 

70 

76 

19.4 

17.5 

5.2 

77 

76 

17.6 

18.3 

4.8 

84 

75 

19.5 

18.7 

3.8 

63 

77 

12.7 

16.8 

3.4 

70 

76 

17.0 

15.8 

3.6 

77 

75 

16.1 

19.4 

3.9 

63 

73 

17.3 

16.1 

5.1 

70 

71 

18.4 

16.1 

5.9 

77 

70 

17.3 

18.2 

4.9 

63 

68 

16.1 

15.0 

4.6 

70 

68 

15.9 

14.3 

4.8 

77 

66 

14.6 

14.9 

4.9 

56 

66 

16.5 

14.2 

5.1 

63 

65 

15.8 

15.5 

5.4 

70 

63 

19.5 

19.4 

6.5 

49 

62 

15.3 

18.8 

6.8 

56 

60 

17.4 

18.7 

6.2 

63 

60 

17.6 

17.6 

\^ 

Z*TA 


VI  *h 
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3.3    EXAMPLE  SHOWING  METHOD  OF 
CALCULATION 

Example  3.1  The  Effects  of  Rainfall,  Temperature,  and  Time  of 
Exposure  on  Ret  Loss  of  Flax.  The  data  given  in  Table  3.1  are  from  a 
dew-retting  experiment,  in  which  flax  was  laid  out  under  various  climatic 
conditions  and  for  various  periods.  Two  samples  were  taken  from  each  trial. 
The  mean  daily  rainfall,  mean  maximum  daily  temperature,  and  retting  period 
were  taken  as  the  independent  variables.  The  sums  of  squares  and  products  of 
deviations  from  the  mean  are  calculated  for  each  of  the  variables  and  are  set 
out  in  Table  3.2.    The  method  of  Crout  (1941),  which  is  one  of  the  methods 


TABLE  3.2 
Sums  of  Squares  and  Products  of  Values  in  Table  3.1 


40.2770 

-65.489 

-120.230 

27.3193 

-65.489 

4097.33 

'  1561.11 

236.078 

120.230 

1561.11 

1943.70 

66.193 

also  proposed  by  Banachiewicz,  is  used  to  invert  the  matrix  and  solve  the 
equations.  We  first  apply  Crout's  method  to  the  determination  of  the  regression 
coefficients  and  then  show  its  more  general  application  to  the  inversion  of  the 
matrix  T. 


3.4    EVALUATION  OF  REGRESSION  COEFFICIENTS 

The  first  steps  in  the  solution  consist  of  forming  an  auxiliary  matrix 
(Table  3.3)  from  the  original  matrix,  in  the  following  way. 


TABLE  3.3 
The  Auxiliary  Matrix 


40.2770 
-65.489 
-120.230 


-1.625  97 
3990,85 
1365.62 


-2.985  08 
0.342  188 
1117.51 


0.678  3 
0.070  29 
0.046  32 


1 .  The  order  of  computations  is  to  determine,  first,  the  elements  of  the 
first  column,  then  the  remaining  elements  of  the  first  row,  the  remaining 
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elements  of  the  second  column,  the  remaining  elements  of  the  second  row, 
and  so  on  until  the  auxiliary  matrix  is  complete. 

2.  The  first  column  of  the  auxiliary  matrix  is  identical  with  the  first 
column  of  the  original  matrix. 

3.  The  first  row  of  the  auxiliary  matrix,  apart  from  the  first  term,  is 
given  by  dividing  the  corresponding  elements  in  the  original  matrix  by  the 
first  element  in  the  auxiliary  matrix. 

4.  More  generally,  each  element  on  or  below  the  principal  diagonal  is 
equal  to  the  corresponding  element  of  the  original  matrix  minus  the  sum  of 
products  of  elements  in  its  row  and  corresponding  elements  in  its  column  in 
the  auxiliary  matrix  that  involve  only  previously  computed  elements. 

5.  Each  element  to  the  right  of  the  principal  diagonal  is  given  by  a 
calculation  which  differs  from  step  4  only  in  that  there  is  a  final  division  by 
the  diagonal  element  in  the  auxiliary  matrix. 

For  instance,  the  element  in  the  third  row  and  the  second  column 

=  1561.11  -(-120.230)(  -1.62597) 
=  1365.62, 

whereas  the  element  in  the  second  row  and  the  third  column 

=  [1561.11  -  (-65.489X -2.98508)1/3990.85 
=  1365.62/3990.85 
=  0.342  188. 

Each  element  to  the  right  of  the  principal  diagonal  is  seen  to  be  equal 
to  the  corresponding  element  below  the  principal  diagonal,  divided  by  a 
diagonal  element;  this  fact  reduces  the  computations  considerably  when 
the  original  matrix  for  the  independent  variables  is  symmetric.  It  is  to  be 
noted  that  the  operations  on  the  rows  are  carried  right  through  to  the 
final  column,  which  is  that  of  sums  of  products  with  the  dependent  variable. 

The  remaining  step  is  to  calculate  the  final  matrix,  which  actually 
consists  of  the  column  of  partial  regression  coefficients.  The  elements  are 
determined  in  reverse  order  to  the  elements  of  the  auxiliary  matrix.  We 
begin  with  the  last  element  in  the  last  column,  in  the  auxiliary  matrix, 
which  becomes  the  last  element  in  the  final  matrix.  Each  other  element  in 
the  final  matrix  is  equal  to  the  corresponding  element  of  the  last  column  of 
the  auxiliary  matrix,  minus  the  sum  of  products  of  elements  in  its  row 
in  the  auxiliary  matrix  and  corresponding  elements  in  its  column  in  the 
final  matrix  that  have  been  previously  computed. 

As  Crout  points  out,  in  forming  products  in  the  final  matrix  only  those 
elements  of  the  auxiliary  matrix  are  used  that  lie  to  the  right  of  the 
principal  diagonal  and  to  the  left  of  the  final  column. 


MULTIPLE   AND    POLYNOMIAL   REGRESSION 


Thus,  in  the  example,  the  final  matrix  is 

0.9051 

0.05444 

0.04632. 


29 


(3.3) 


the  elements  of  which  are  the  partial  regression  coefficients  of  ret  loss  on  rainfall, 
time,  and  temperature  respectively. 

It  is  often  desirable  to  carry  a  check  column  to  ensure  accuracy  at  each  stage 
of  the  work.  In  the  original  matrix  each  element  of  the  check  column  is  the 
sum  of  the  elements  of  the  corresponding  row.  The  check  column  is  treated  in 
the  same  way  as  the  final  column  of  the  matrix.    The  check  columns  are : 


Original  Matrix 

Auxiliary  Matrix 

Final  Matrix 

-118.123 

-2.93276 

1.90509 

5829.037 

1.41247 

1.05444 

3450.773 

1.04632 

1.04632 

For  the  auxiliary  matrix  the  check  column  checks  the  totals  of  elements 
to  the  right  of  the  principal  diagonal  in  each  row. 

3.5    INVERSION  OF  A  SQUARE  MATRIX 

As  Fisher  (1954)  points  out,  it  is  generally  desirable  to  calculate,  in 
addition  to  the  actual  regression  coefficients,  the  solutions  of  three  sets  of 
simultaneous  equations  obtained  by  replacing  the  final  column  of  Table 
3.2  by,  respectively, 


m 

roi 

[01 

0 

, 

i 

,  and 

0 

0 

0 

1 

The  solutions  of  these  three  sets  of  equations  actually  make  up  the 
inverse  of  the  matrix  of  sums  of  squares  and  products  of  the  independent 
variables.     In  other  words,  if  the  solutions  are 


then  for  Example  3.1 

40.2770     -65.489  -120.230' 
-65.489      4097.33         1561.11 
-120.230       1561.11         1943.70 


,11       ,12 

f13_ 

[1     0    0] 

,12       ,22 

,23 

= 

0     1     0 

,13       ,23 

,33 

0    0     1 

.      (3.4) 


These  three  sets  of  equations  can  be  solved  in  the  same  way  as  the  single 
set  was  solved,  the  three  sets  of  solutions  being  arrived  at  simultaneously. 
The  auxiliary  matrix  for  this  calculation  is  shown  in  Table  3.4. 
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Each  column  of  this  matrix  is  obtained  from  the  corresponding  column 
of  the  matrix  on  the  right-hand  side  of  (3.4)  in  the  same  way  as  the  final 
column  of  Table  3.3  was  obtained  from  the  final  column  of  Table  3 .2.  The 
final  matrix,  or  inverse  matrix,  is  given  in  Table  3.5. 


TABLE  3.4 
The  Extended  Auxiliary  Matrix 


40.2770 

-1.625  97 

-2.985  08  "I 

[-0.024  828  1 

0          0 

-65.489 

3990.85 

0.342  188 

0.000  407  42 

0.000  250  573  0 

120.230 

1365.62 

1117.51    J 

L.0.002  173  31 

-0.000  306  206  0.000  894  847 

The  symmetry  of  the  elements  of  the  inverse  matrix  about  the  principal 
diagonal  provides  a  check  on  the  accuracy  of  the  computations;  if 
desired,  a  check  column  may  be  carried  also,  to  ensure  the  accuracy  of 
each  step. 

The  use  of  the  inverse  matrix  has  several  advantages.  First,  the 
regression  coefficients  are  obtainable  directly  by  multiplying  the  successive 


TABLE  3.5 
The  Inverse  Matrix 


10-6  x 


30,768.9  -336.26  2,173.31 

-336.26  355.353         -306.206 

2,173.31         -306.206  894.847 


sums  of  products  with  the  independent  variable  by  the  corresponding 
elements  in  a  column  of  the  inverse  matrix. 
Thus,  the  regression  coefficient  of  ret  loss  on  time  is 

10"6(-336.26  x  27.3193  +  355.353  x  236.078  -  306.206  x  66.193) 

=  0.05444, 
which  agrees  with  the  previous  result. 

If,  as  often  happens,  the  regression  of  a  number  of  dependent  variables 
on  the  same  set  of  values  of  the  independent  variables  is  required,  the 
determination  of  the  inverse  matrix  enables  the  calculation  of  the  regression 
coefficients  for  each  such  variable  to  be  performed  without  the  solution  of 
a  set  of  equations  in  each  case. 

Second,  as  described  earlier,  the  standard  errors  of  the  regression 
coefficients,  and  the  correlations  among  them,  are  readily  obtained  from 
the  inverse  matrix. 


MULTIPLE  AND   POLYNOMIAL  REGRESSION  31 

3:6    STANDARD  ERRORS  OF  THE  REGRESSION 
COEFFICIENTS 

To  determine  the  standard  errors  of  the  regression  coefficients,  it  is 
necessary  to  eliminate  from  the  variation  in  the  dependent  variable  the 
part  that  is  attributable  to  the  independent  variables.  The  procedure  for 
this  analysis  of  variance  is  to  calculate  the  sum  of  squares  for  regression, 
which  is  the  sum  of  products  of  the  regression  coefficients  and  the  corre- 
sponding sums  of  products  of  the  dependent  variable  with  each  independent 
variable. 

In  the  present  example  the  regression  sum  of  squares,  with  three  degrees  of 
freedom,  is 

27.3193  x  0.9051  +  236.078  x  0.05444  +  66.193  x  0.04632  =  40.64   (3.5) 

The  analysis  of  variance  is  set  out  in  Table  3.6. 

TABLE  3.6 
Analysis  of  Variance  of  Data  in  Table  3.1 

D.  F. 

Regression  3 

Residual  23 

Duplicates  27 


Sum  of  Squares 

Mean  Square 

40.64 

13.55 

55.52 

2.414 

41.11 

1.523 

Total  53  137.27 


The  effect  of  the  three  weather  variates  on  ret  loss  is  highly  significant.  The 
standard  errors  of  the  three  regression  coefficients  are,  in  order, 

V(2.414  x  30,768.9  x  10"6)  =  0.273 
V(2.414  x  355.353  x  10"6)  =  0.0293  (3.6) 

V(2.414  x  894.847  x  10"6)  =  0.0465 

Comparison  of  the  regression  coefficients  (3.3)  with  their  standard  errors  (3.6) 
reveals  that  the  effect  of  rainfall  is  significant  at  the  1  per  cent  level  and  that  of 
time  is  significant  at  the  5  per  cent  level,  whereas  that  of  temperature  is  not 
significant.  * 

These  standard  errors  and  the  significance  tests  based  on  them  should, 
however,  be  interpreted  with  care.  The  significance  tested  is  actually  that 
of  the  additional  amount  of  variation  in  ret  loss  accounted  for  by  the 
variable  considered,  above  that  accounted  for  by  the  remaining  variables. 
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Thus,  for  temperature,  all  that  can  be  said  is  that  the  fit  of  the  regression 
on  rainfall  and  time,  ignoring  temperature,  is  not  significantly  worse  than 
the  fit  of  the  regression  on  all  three  variables.  In  other  words,  the  data 
supply  no  evidence  that  any  effect  that  temperature  may  have  is  not 
adequately  represented  by  the  effect  of  rainfall  and  time. 

The  significance  of  the  partial  regression  coefficients  may  also  be  tested 
by  means  of  an  analysis  of  variance.  The  sum  of  squares  for  the  variation 
in  ret  loss  accounted  for  by  each  individual  variate,  additional  to  that 
accounted  for  by  the  other  two,  is  found  by  dividing  the  square  of  the 
partial  regression  coefficient  by  the  corresponding  diagonal  term  of  the 
inverse  matrix : 

Sums  of  Squares  for  Partial  Regression 

106  x  0.90512 
Rainfa11  30,768.9        =26'62 

106  x  0.054442 
Time  355.353         =    8J4 

106  x  0.046322 
Temperature        $mmj =    2.40 

The  test  of  significance  on  these  sums  of  squares  is  equivalent  to  that 
on  the  partial  regression  coefficients.  It  will  be  noted  that  these  three 
sums  of  squares  do  not  total  to  the  sum  of  squares  for  regression  given 
in  Table  3.6.  This  is  because  the  effects  of  the  three  factors  are  not 
independent.  By  subtracting  the  temperature  sum  of  squares  from  the 
total  regression  sum  of  squares,  we  obtain  the  sum  of  squares  for  regression 
on  rainfall  and  time,  38.24  with  two  degrees  of  freedom,  which  is  clearly 
highly  significant. 

If  we  were  to  consider  the  simple  regressions  of  ret  loss  on  each  factor, 
the  other  two  factors  being  ignored,  we  should  require  a  different  set  of 
sums  of  squares  for  testing  significance : 

Sums  of  Squares  for  Simple  Regression 

27.3 1932 


40.2770 

Time 

236.0782 
4097.33 

=  13.60 

Temperature 

66.1932 
1943.70 

=  2.25 
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This  shows  that  the  temperature  effect  is  not  significant,  even  when  other 
factors  are  ignored. 

3.7    SHORT  CUTS  USING  THE  AUXILIARY  MATRIX 

For  a  rapid  test  of  the  over-all  significance  of  the  regression  on  the 
three  variables,  it  will  be  found  that  the  regression  sum  of  squares  can  be 
obtained  from  the  auxiliary  matrix,  as  the  sum  of  products  of  the  diagonal 
elements  with  the  squares  of  the  corresponding  elements  of  the  final 
column ;   that  is, 

40.2770  x  0.67832  +  3990.85  x  0.70292  +  1117.51  x  0.46322 

=  18.53  +  19.71  +  2.40  (3.7) 

=  40.64. 

Now  in  the  expression  (3.7)  the  first  term  is  identified  as  the  sum  of  squares 
for  regression  on  rainfall,  the  other  two  factors  being  ignored.  The  last 
term  is  the  sum  of  squares  for  regression  on  temperature,  additional  to 
that  for  regression  on  rainfall  and  time.  It  can  be  shown  that  the  middle 
term  is  the  sum  of  squares  for  regression  on  time,  additional  to  that  for 
regression  on  rainfall  alone.  In  other  words,  the  successive  partial  sums 
of  (3.7)  are  the  sums  of  squares  for  regression  on  one,  two,  and  three 
factors,  taken  in  the  given  order.  This  enables  us  to  test  the  significance 
of  adding  new  variables  in  the  regression. 

From  similar  considerations  it  can  be  shown  that  the  successive  elements 
of  the  last  column  of  the  auxiliary  matrix  are  the  regression  coefficients  in 
the  regression  involving  only  the  variates  corresponding  to  the  previous 
elements  of  the  column.  Thus,  the  first  element  (0.6783)  is  the  simple 
regression  coefficient  of  ret  loss  on  rainfall ;  0.07029  is  the  partial  regression 
coefficient  on  time,  the  effect  of  rainfall  being  eliminated ;  and  0.04632  is 
the  partial  regression  coefficient  on  temperature,  time  and  rainfall  being 
eliminated. 

These  facts  may  be  proved  by  considering  the  method  of  formation  of 
the  auxiliary  matrix.  Also  it  may  be  shown  that  the  process  of  forming 
the  final  matrix  from  the  auxiliary  matrix  is  equivalent  to  eliminating 
from  the  regression  coefficient  on  each  variable  the  effect  of  the  subsequent 
variables.  The  effect  of  the  preceding  variables  having  been  eliminated  in 
the  auxiliary  matrix,  and  that  of  the  subsequent  variables  in  the  final 
matrix,  it  follows  that  the  elements  obtained  are  the  required  partial 
regression  coefficients.  A  proof  of  the  method  of  Crout  along  these 
lines,  although  less  general  in  application,  would  be  more  closely  linked 
with  statistical  concepts  than  the  proof  given  by  him,  which  is  an  inductive 
one. 
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3.8    ADDITION  OR  OMISSION  OF  INDEPENDENT 
VARIABLES  BY  CROUT'S  METHOD 

Methods  of  including  or  excluding  an  independent  variable  in  multiple 
linear  regression  have  been  developed  by  Cochran  (1938).  These  methods 
obviate  the  need  for  resolving  the  normal  equations  each  time  a  variable 
is  added  or  omitted,  simple  adjustments  to  the  existing  inverse  matrix  and 
regression  coefficients  being  all  that  are  required. 

The  development  of  the  auxiliary  matrix  in  Crout's  method  makes  the 
addition  of  new  independent  variables  extremely  simple.  To  the  original 
matrix  are  added  rows  and  columns  of  sums  of  squares  and  products  for 
the  new  variables.  The  existing  elements  of  the  auxiliary  matrix  are 
unaffected;  the  new  rows  and  columns,  corresponding  to  the  new  rows 
and  columns  in  the  original  matrix,  are  then  calculated  in  exactly  the  same 
way  as  the  existing  elements  of  the  auxiliary  matrix  were.  From  this  new 
auxiliary  matrix  the  final  matrix  of  regression  coefficients  may  be  calculated, 
but  before  this  is  done  the  significance  of  each  additional  variate  in  the 
regression  can  be  tested  by  the  method  previously  described,  using  the  new 
elements  of  the  auxiliary  matrix. 

Moreover,  the  final  variable  to  be  included  is  the  one  for  which  the 
auxiliary  matrix  gives  the  completely  adjusted  partial  regression  coefficient. 
This  gives  immediately  the  magnitude  of  the  effect  of  the  new  variable,  as 
well  as  its  significance. 

The  omission  of  independent  variables  is  even  simpler,  provided  the 
variables  to  be  omitted  are  the  final  ones  considered.  From  the  auxiliary 
matrix  are  omitted  those  rows  and  columns  corresponding  to  the  omitted 
variates,  and  the  final  matrix  is  then  determined  as  before.  Thus,  if 
temperature  were  to  be  omitted  from  the  regression,  the  auxiliary  matrix 
of  Table  3.4  would  become 


"    40.2770     -1.625  97" 

"0.024  828  1 

0 

-65.489    3990.85 

0.000  407  42    0.000  250  573_ 

and  the  final  inverse  matrix 

io-6 

"25,490.6      407.42  " 
407.42    250.573_ 

> 

giving  the  regression  coefficients 

V  =  0.7926 

v  = 

=  0.07029. 

Tests  for  the  significance  of  each  omitted  variable  would  be  made  on 
sums  of  squares  derived  from  the  auxiliary  matrix. 
When  the  variable  to  be  omitted  is  not  the  final  one,  the  procedure  is 
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not  quite  so  simple,  for  the  regression  coefficients  on  the  subsequent 
variables,  which  are  adjusted  for  that  variable  in  the  auxiliary  matrix,  have 
to  be  restored.  Moreover,  since  the  final  and  inverse  matrices  must  be 
determined  before  the  significance  of  the  partial  regression  for  the  omitted 
variable  can  be  tested,  it  is  probably  best  to  use  the  method  to  be  described 
next  in  such  cases.  It  is  clearly  desirable,  if  variables  are  to  be  omitted,  to 
arrange  them,  as  far  as  possible,  in  order  of  decreasing  importance;  this 
can  sometimes,  although  not  in  general,  be  determined  from  the  magnitude 
of  the  sums  of  squares  for  simple  regressions.    « 

Although  the  methods  given  here  for  addition  or  omission  of  variates 
are,  in  principle,  the  same  as  those  given  by  Cochran,  the  procedure  of 
working  through  the  auxiliary  matrix  greatly  simplifies  the  calculation  of 
the  inverse  matrix.  The  method  also  lends  itself  more  readily  to  generali- 
zation when  a  large  number  of  variables  is  to  be  added  or  omitted. 

3.9    THE  OMISSION  OR  INCLUSION  OF  AN 
INDEPENDENT  VARIABLE  IN  GENERAL 

Besides  the  previously  described  method  of  eliminating  an  independent 
variable  or  including  a  new  one,  based  on  operations  with  the  auxiliary 
matrix,  there  is  a  more  general  method  which  does  not  depend  on  the 
order  in  which  the  variables  are  excluded  or  included,  and  which  was  first 
described  systematically  by  Cochran  (1938).  The  general  method  is  to 
make  simple  adjustments  to  the  inverse  matrix  and  the  regression  coeffi- 
cients and  is  accordingly  very  easy  to  apply. 

(i)  Omission  of  a  Variable 

Suppose  that  there  are  p  independent  variables  and  that  the  one  to  be 
omitted  is  designated  the  /?th.  The  elements  of  the  inverse  matrix  of  p 
variables  will  be  denoted  thi  and  the  adjusted  elements  t'hi;  likewise  the 
old  and  new  regression  coefficients  will  be  denoted  bi  and  b/  respectively. 
Then  the  adjustment  formulas  are 

t'hi  _  {hi  fhpfiplfpp 

and  b/  =  b{  -  bvtivjtVI>. 

The  second  of  these  formulas  is  actually  a  simple  consequence  of  the  first. 
The  reduction  in  the  regression  sum  of  squares  due  to  the  omission  of  the 
variable  is  clearly 

bp2lt™. 

(ii)  Inclusion  of  a  Variable 

We  denote  the  new  variable  by  xr  (r  =  p  +  1)  and,  as  before,  the 
adjusted  elements  by  a  prime.    In  this  case,  besides  adjusting  the  existing 
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elements,  we  must  also  determine  the  new  elements  t'ir  and  the  new 
regression  coefficient  br'.     We  therefore  first  determine  the  new  elements: 

t'rr{trr-!2tMthrhr)=U 
h  i 

then  t'hr  =  -t'rr^tMtir 

i 

and  t'M  =  tM  —  t'hr2  tiktkr 

k 

fhi    I     f'hr-f'irlf'rr 

The  b-  may  then  be  found,  beginning  with  br',  from  the  following 
formulas : 

br'=Ip/iT 

i 

b/  =  b,  +  br't'ir/t'rr, 

which  are  clearly  the  converses  of  the  formulas  given  in  the  previous 
subsection. 

The  increase  in  the  regression  sum  of  squares,  due  to  fitting  the  additional 
independent  variable,  is 

V2/',rr- 

3.10    FIDUCIAL  LIMITS  FOR  THE  REGRESSION 
COEFFICIENTS 

As  in  simple  linear  regression,  fiducial  limits  may  be  determined  for 
each  of  the  coefficients  in  multiple  regression.  For  each  coefficient 
considered  separately,  the  fiducial  limits  will  be  based  on  its  standard 
error  and  the  appropriate  probability  level  of  the  t  distribution ;  thus,  for 
bt  the  limits  are 

bt  ±  tssjtu. 

Such  limits  for  an  individual  coefficient  are  important  when  its  departure 
from  some  hypothetical  value  is  under  test — in  particular,  for  the  test  that 
the  coefficient  is  zero.  In  multiple  regression,  however,  it  is  often  of 
interest  to  have  simultaneous  limits  for  all  the  coefficients,  or  at  any  rate 
for  a  set  of  them ;  for  it  is  clear  that  the  limits  attained  by  each  of  the 
coefficients  separately  could  not  be  attained  by  them  simultaneously  at  the 
same  significance  level.  These  fiducial  limits  are  readily  determined  in 
principle,  although  their  detailed  calculation  may  be  difficult.  Since  the 
limits  will  consist  of  a  curve  or  surface  rather  than  a  pair  of  points,  they 
may  be  more  clearly  described  as  a  fiducial  boundary.  The  region 
enclosed  will  be  called  a  fiducial  region,  as  the  generalization  of  a  fiducial 
range  in  one  dimension. 
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Consider  the  quadratic  form 

h  i 

this  is  a  sum  of  squares  with/?  degrees  of  freedom,  so  that  the  ratio  Qjps2 
is  distributed  as  Fwithp  and  n  —  p  —  1  degrees  of  freedom.  Accordingly, 
to  find  the  fiducial  boundary  for  the  ft,  at  a  given  probability  level,  we 
take  values  of  the  ft  to  satisfy  the  equation 

Q=pFs\ 

The  concordance  of  a  given  set  of  ft  with  the  data  is  similarly  tested  by 
whether  the  inequality 

Q<pFs2 
is  satisfied. 

If  the  fiducial  boundary  for  a  set  of  the  coefficients  is  required,  the 
calculation  is  a  little  more  complicated.  Suppose  that  limits  for  the  first  r 
coefficients  are  required;  let  {T~1)r  be  the  matrix  consisting  of  the  first  r 
rows  and  columns  of  the  inverse  matrix  T"1,  and  let  th/  be  a  typical 
element  qf  the  inverse  of  (J7-1),..     Then  the  quadratic  form 

G'  =  220»-/w<-A)'M' 

h  i 

is  a  sum  of  squares  with  r  degrees  of  freedom,  so  that  the  ratio  Q'jrs2 
is  distributed  as  Fwith  r  and  n  —  p  —  1  degrees  of  freedom.     The  fiducial 
region  and  its  fiducial  boundary  may  be  determined  using  this  distribution. 
When  r  =  2,  we  have  for  the  first  two  independent  variables 

tu'   =   ^[,11,22  _  (,12)2}, 

h2  =  -t^Wh22  -  (*12)2], 

and  t22'  =  t11^11!22  -  (t12)2]; 

hence  to  test  the  departures  of  ft  and  ft  from  zero 

Q'  =  (/»y  _  2t12bxb2  +  t^b2)!^22  -  (t12)2]. 

3.11     CORRECTIONS 

A  correction  is  any  adjustment  in  the  value  of  one  variable  to  allow  for 
the  effect  of  the  variation  in  a  second  variable.  For  example,  the  strength 
properties  of  most  materials  are  affected  by  temperature.  If  a  strength 
determination  is  made  at  some  temperature  other  than  the  standard,  it  is 
customary  to  apply  an  empirical  correction,  given  by  the  product  of  the 
departure  of  temperature  from  standard  and  a  correction  factor,  to  give 
the  strength  value  that  would  be  expected  at  the  standard  temperature. 
It  will  be  clear  that  the  correction  factor  in  such  an  application  is  simply  a 
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regression  coefficient ;  and  from  a  suitably  designed  experiment  in  which 
strength  values  were  determined  at  various  temperatures,  the  correction 
factor  could  be  determined.  Here  the  assumption  is  made  that  the 
regression  of  strength  on  temperature  is  linear,  an  assumption  which  is 
near  enough  to  base  the  correction  on,  especially  if  the  correction  is  a 
small  fraction  of  the  measured  value. 

Sometimes,  rather  than  determining  the  double  regression  of  a  variable 
y  on  two  others  {xx  and  x2),  it  is  found  more  convenient  to  correct  one  of 
the  independent  variables  for  variations  in  the  other  and  to  carry  out  a 
simple  regression  on  the  corrected  variable.  This  procedure  is  satisfactory 
if  the  correction  required  is  small,  but  otherwise  it  may  give  results  that 
are  somewhat  artificial,  since  the  corrected  values  of  the  independent 
variable  may  not  correspond  to  observable  quantities.  It  is  actually 
appropriate  when  it  is  known  or  assumed  that  the  second  independent 
variable  x2  is  uncorrelated  with  the  dependent  variable;  the  correction 
then  plays  the  part  of  reducing  errors  in  x1  affecting  the  relationship  of  xx 
with  y. 

These  two  types  of  correction  must  be  considered  separately,  since  the 
calculations  for  the  determination  of  each  are  different.  However,  for 
either  type,  the  appropriate  procedure,  given  a  sample  of  values  of  each 
variable,  is  to  determine  the  regression  of  y  on  xx  and  x2,  namely 

Y  =  b0  +  bxxx  +  b2x2. 

(i)  y  to  be  Corrected  for  x2 

Here  the  appropriate  correction  factor  is  clearly  the  negative  of  the 
partial  regression  coefficient  b2.  Indeed,  for  the  sample  values,  the  simple 
regression  of  y  —  b2x2  on  xx  gives  the  regression  coefficient 

S(y  -  b2x2){xx  -  gi)  =Pi~  b2t12 
S(x1  —  x-j)  tn 

which,  by  (3.2),  =  bl9 

the  partial  regression  coefficient  of  y  on  xv  Hence  the  adjustment  using 
—b2  gives  here  the  same  results  as  a  double  regression  analysis. 

Often,  of  course,  values  of  y  and  x2  are  observed  initially,  and  the  y 
must  be  adjusted  for  the  x2,  regardless  of  what  later  use  will  be  made  of 
the  adjusted  values.  Then  all  that  can  be  done  is  to  take  the  simple 
regression  of  y  on  x2.  This  will  be  satisfactory,  provided  any  correlation 
between  xx  and  x2  is  known  to  be  fortuitous,  and  will  on  the  average 
vanish.  The  correction  factor  will  often  be  got  in  this  way,  or  even  be 
derived  from  one  set  of  data  and  applied  extensively  to  later  work.  These 
procedures  are  usually  satisfactory  as  long  as  corrections  are  small. 
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(ii)  xx  to  be  Corrected  for  x2 

From  the  form  of  the  regression  equation,  the  appropriate  correction 
factor  is  seen  to  be 

c  =  b2\bv 

If  the  true  correction  factor  is  y,  then 

b2  -  bYy 

is  distributed  about  zero  with  variance 

s\t22  -  2t12y  +  tny2)- 

Hence  fiducial  limits  for  y  are  given  as  the  values  for  which  F  with  1  and 
n  —  3  degrees  of  freedom  is  significant : 

f=  (b2-bl7)2 


s2(t22  -  2t12y  +  t^y2) 


Here,  too,  the  correction  factor  will  often  be  determined  for  one  set  of 
data  and  applied  generally  to  subsequent  sets.  The  fiducial  limits  for  the 
true  value  therefore  give  some  indication  of  the  error  likely  to  be  introduced 
by  general  application  of  the  correction. 

In  correcting  x±  for  variation  in  x2  we  shall  sometimes  want  to  apply  a 
correction  regardless  of  what  variable  x1  is  to  be  correlated  with.  Here 
the  negative  of  the  regression  coefficient  of  xx  on  x2  would  have  to  be  used 
as  the  correction  factor.  It  can  be  shown  that  this  is  exactly  the  same  as 
the  correction  factor  b2/bv  provided  the  simple  correlation  of  x2  with  y  is 
zero — in  other  words,  provided  x2  is  a  variable  introducing  "errors"  into 
xx  but  not  affecting  y.    For  then,  since  p2  =  0, 

b2  =  -/Vu/Gh'm  -  '122) 
and  6i=/>iW('n'22->i22)> 

so  that  b2\bx  —  —t12lt22, 

the  negative  of  the  regression  coefficient  of  xx  on  x2.  Thus  the  effect  of 
the  correction  is  to  increase  the  regression  of  y  on  xx  through  its  effect  in 
reducing  the  errors  in  xx  owing  to  the  lack  of  control  of  x2. 

We  now  give  an  example  of  a  correction  applied  to  an  independent 
regression  variable. 

Example  3.2  Determination  of  Total  Solids  in  Skim  Milk  Concen- 
trates by  Means  of  a  Refractometer;  Adjustment  of  Results  for 
Temperature  Variation  (Lawrence,  1955).  In  order  to  calibrate  a  refracto- 
meter for  the  determination  of  total  solids  in  skim  milk,  a  series  of  55  samples 
was  taken  and  refractometer  reading  and  total  solids  were  determined  for  each. 
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It  was  stated  that  refractometer  readings  should  be  adjusted  for  temperature, 
the  correction  factor  being  —0.08  per  degree  centigrade,  so  temperature  was 
also  recorded. 

It  is  unnecessary  to  present  the  original  data  here.  The  regression  equation 
for  total  solids  y  in  terms  of  refractometer  reading  x±  and  temperature  x2  was 
found  to  be 

Y  =  1.33  +  0.9043^  -  0.057*2, 

and  the  covariance  matrix  for  the  regression  coefficients  (that  is,  s2T~1)  was 

58.51        -2.7" 

-2.7       1048 


10" 


It  is  clear  from  these  results  that  the  coefficient  b2  is  not  significantly  different 
from  zero,  its  standard  error  being  0.032,  and  hence  that  the  correction  factor  is 
not  significant.  However,  it  is  of  interest  to  determine  the  correction  factor 
and  its  fiducial  limits  and  to  test  its  concordance  with  the  given  correction 
factor  -0.08. 

The  1  per  cent  point  of  F  with  52  degrees  of  freedom  is  7.15,  so  that  the 
equation  for  the  99  per  cent  fiducial  limits  of  y  is 

106( -0.057  -0.9043y)2 
'      ~  1048  +  5.4y  +  58.51y2   ' 
which  reduces  to 

817,400y2  +  103,050y  -  4243  =  0, 
giving  the  limits  as 

-0.16    and     +0.03. 

Thus  the  given  value  —0.08  does  not  differ  significantly  from  the  value  given 
by  the  data.    In  view  of  this,  the  given  value  of  the  correction  should  be  used. 
The  estimate  of  the  correction  factor  from  the  data  is 

-0.057/0.9043  =  -0.063. 

In  this  example,  since  the  standard  error  of  b1  is  relatively  small,  it  would  be 
sufficiently  accurate  to  derive  fiducial  limits  for  y  using  the  ratio  b2/b1  and  an 
approximate  standard  error.    The  approximate  variance  is  given  as 


=  0.00128 


whence  the  standard  error  is  0.0358,  giving  the  same  fiducial  limits  as  were 
given  before.  The  approximate  method  is  very  often  sufficiently  accurate,  but 
the  exact  method  just  given  is  certain  and  is  therefore  more  generally  applicable. 

3.12    CURVILINEAR  (POLYNOMIAL)  REGRESSION 

The  fitting  of  curvilinear  regression  equations  defined  by  polynomials  of 
the  form 

Y=b0  +  b1x  +  V2  +  ' '  *  +  VP 
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is  in  principle  no  different  from  the  fitting  of  multiple  regression  equations 
as  defined  previously.  The  different  powers  of  x  simply  play  the  role  of 
the  different  independent  variables  in  the  earlier  discussion.  However, 
polynomial  regression  has  some  special  features  which  merit  separate 
consideration. 

In  the  first  place  it  should  be  pointed  out  that  the  fitting  of  polynomials 
of  high  degree  to  experimental  or  other  data  is  seldom  of  much  value. 
As  Snedecor  (1956,  p.  470)  says, 

The  student  may  well  question  the  advisability  of  fitting  curves.  A  stupendous 
amount  of  time  has  been  wasted  in  ill-advised  curve  fitting.  Only  when  the  end 
in  view  is  clear  should  the  task  be  undertaken.  Often  a  graph  of  the  data 
points  is  sufficient. . . .  Occasionally,  fitted  curves  are  required  for  interpolation. 
In  many  cases,  graphical  representation  of  the  data  is  sufficient. 

A  polynomial  is  usually  fitted  in  order  to  smooth  out  fluctuations  in 
the  data  caused  by  random  or  uncontrolled  errors,  not  because  it  is 
thought  to  represent  the  actual  relationship.  Unless  the  data  show  either 
a  linear  or  a  simple  parabolic  relationship,  it  may  be  equally  satisfactory 
to  smooth  the  data  by  plotting  the  points  and  drawing  a  freehand  curve. 
The  disadvantages  of  fitting  freehand  curves  are,  first,  the  possibility  of 
systematic  error  caused  by  the  unconscious  bias  of  the  experimenter  and, 
second,  the  impossibility  of  making  a  valid  estimate  of  the  residual 
variation  about  the  curve.  The  first  objection  can  be  overcome  by 
experience  and  also  by  the  freehand  curves'  being  checked  by  another 
worker  not  likely  to  be  similarly  biased.  The  second  objection  is  important 
only  occasionally,  and  when  it  is  some  form  of  polynomial  curve  must  be 
fitted  by  the  method  of  least  squares. 

In  the  fitting  of  polynomial  regressions,  moreover,  the  significance  of 
each  of  the  regression  coefficients  cannot  be  tested  in  the  same  way  as  can 
the  regression  coefficients  in  multiple  regression.  The  form  the  null 
hypothesis  takes  when  a  polynomial  regression  is  being  fitted  is  almost 
always  that  a  polynomial  of  a  certain  degree  represents  the  relationship, 
and  the  test  of  the  hypothesis  is  whether  terms  of  higher  degree  contribute 
significantly  to  the  relationship.  Hence,  in  fitting  a  relationship  of  this 
type,  it  is  not  open  to  us  to  omit  any  intermediate  term,  and  indeed  it  is  not 
valid  to  test  such  intermediate  terms  for  significance.  The  first  test  to  be 
made  is  an  over-all  test  of  whether  the  polynomial  in  fact  gives  a  significant 
regression  relation.  What  can  then  be  done  is  to  test  the  coefficient  of 
the  highest  power,  that  of  the  next  highest,  and  so  on,  successively;  if 
there  is  no  reason  to  the  contrary,  those  that  do  not  contribute  significantly 
to  the  regression  may  be  dropped  from  the  equation.  For  example,  if  a 
cubic  regression  were  being  fitted,  the  first  significance  test  would  be  of 
the  over-all  regression  with  three  degrees  of  freedom;    if  this  were  not 
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significant,  there  would  be  no  need  to  make  any  further  test.  If  the 
over-all  regression  is  significant,  the  question  arises  whether  the  relation- 
ship may  be  adequately  represented  by  a  polynomial  of  lower  degree; 
accordingly,  the  significance  of  the  coefficient  bz  is  next  tested.  If  it  is 
significant,  the  cubic  regression  relation  will  be  accepted.  If  it  is  not 
significant,  a  quadratic  regression  may  be  fitted.  It  should  be  remarked 
that,  when  b3  is  dropped,  all  the  other  coefficients  need  to  be  recalculated. 
The  quadratic  coefficient  will  then  be  tested,  to  determine  whether  a 
linear  or  a  quadratic  relation  is  adequate.  Tests  can  be  made  in  this  way 
for  polynomial  regressions  of  any  degree. 

There  are,  of  course,  exceptions  to  this  procedure,  for  example,  when 
the  null  hypothesis  specifies  a  regression  through  the  origin;  then  it 
would  be  correct  to  test  the  significance  of  b0  before  testing  any  of  the 
other  coefficients. 

In  practice,  the  mathematical  form  of  a  polynomial  regression  equation 
is  seldom  of  importance  except  over  long  ranges  of  the  independent 
variable;  it  is  often  impossible  to  distinguish  fitted  curves  of  different 
form,  provided  each  has  the  same  number  of  parameters,  so  that  each  can 
accommodate  itself  equally  well  to  the  trends  in  the  data. 

3.13    EXAMPLE  ON  THE  FITTING  OF  A  POLYNOMIAL 
REGRESSION 

Example  3.3  Fitting  a  Quadratic  Regression  of  Janka  Hardness 
on  Air -Dry  Density  of  Timber  Species.  The  data  shown  in  Table  3.7  are 
the  mean  values  of  Janka  hardness  for  a  number  of  species  of  timber,  together 
with  the  corresponding  values  of  mean  density.  It  is  known  that  the  relationship 
between  hardness  and  density  is  nonlinear,  so  a  regression  relationship  of  the 
form 

Y  =  b0  +  bxx  +  b2x2 

has  been  fitted.    The  relevant  sums  of  squares  and  products  are  shown  in 
Table  3.8,  from  which  the  inverse  matrix  and  the  regression  coefficients,  shown 
in  Table  3.9,  are  obtained.    The  analysis  of  variance  is  given  in  Table  3.10. 
The  regression  equation  is  thus  found  to  be 

Y  =  -120  +  9.5*  +  0.51*2. 

The  standard  errors  of  b±  and  b2  are,  respectively, 

V(26,170  x  8519.198  x  10"6)  =  14.9 

and  V(26,170  x  0.937  915  x  10-6)  =    0.157. 

For  the  significance  of  b2  we  have  t  =  0.509/0.157 

=  3.24  (significant  at  1  per  cent  level). 

Since  the  coefficient  of  x2  is  significant,  the  quadratic  form  of  regression  is 
retained.    In  this  instance  it  is  a  reasonable  assumption  that  the  regression 
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TABLE  3.7 

Values  of  Density,  (Density)2,  and  Janka  Hardness  for 

36  Species  of  Timber 


Janka 

Janka 

Density,  x, 

Hardness, 

Density,  x, 

Hardness, 

lb./cu.  ft. 

X2 

y 

lb./cu.  ft. 

x2 

y 

67.4 

4543 

2700 

40.7 

1656 

1100 

68.8 

4733 

2890 

66.0 

4356 

3260 

69.1 

4775 

2740 

59.8 

3576 

1940 

57.3 

3283 

2020 

39.4 

1552 

1210 

24.8 

615 

427 

59.2 

3505 

2310 

32.7 

1069 

704 

51.5 

2652 

2010 

51.5 

2652 

1710 

38.8 

1505 

1070 

38.5 

1482 

914 

53.4 

2852 

1880 

46.9 

2200 

1400 

35.6 

1267 

979 

28.4 

807 

517 

30.3 

918 

587 

28.4 

807 

549 

27.3 

745 

413 

40.3 

1624 

1160 

39.9 

1592 

989 

29.0 

841 

648 

56.0 

3136 

1980 

56.5 

3192 

1820 

40.6 

1648 

1010 

42.9 

1840 

1270 

69.1 

4775 

3140 

40.7 

1656 

1130 

57.6 

3318 

1980 

24.7 

610 

484 

45.8 

2098 

1180 

48.2 

2323 

1760 

39.3 

1544 

1020 

Total 

1,646.4 

81,747 

52,901 

Mean 

45.733 

2,270.75 

1,469.47 

TABLE  3.8 
Sums  of  Squares  and  Products  of  Values  in  Table  3.7 


x     r  6/ 

x2         |_609>' 


,454.66     609,545]      371,186 
*2    L609>545     58,628,500j    35,595,200 

22,485,000 


TABLE  3.9 
Inverse  Matrix  and  Regression  Coefficients 


bi 

I  -88.571  85     0.937  914  7j    0.508  631 


10_6  x  T8519.198     -88.57185  1    9.474  3 
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line  will  pass  through  the  origin.  A  direct  test  of  the  significance  of  b0  can  be 
made;  or  an  equivalent  test  can  be  made  by  fitting  a  regression  through  the 
origin  and  testing  the  increase  in  the  residual  sum  of  squares  for  significance. 
Any  significant  departure  of  b0  from  zero  will  indicate  either  that  the  quadratic 
relation  is  not  the  true  relation,  that  there  is  a  "zero-error"  in  the  hardness 


TABLE  3.10 

Analysis 

of  Variance 

of  Janka  Hardness 

D.  F. 

Sum  of  Squares 

Regression 
Residual 

2 
33 

35 

21,621,500 
863,500 

Total 

22,485,000 

26,170 


TABLE  3.11 

Uncorrected  Sums  of  Squares  and  Products,  for 

Determination  of  Regression  through  Origin 


x  x*  y 

81,750.02     4,348,108]     2,790,525 
4,348,108     244,255,500  J    155,720,100 

100,221,600 


TABLE  3.12 
Inverse  Matrix  from  Table  3.11  and  Regression  Coefficients 


6      [230.028  51  -4.094  846  5 

|_  -4.094  846  5  0.076  988  379 


4.250  4 
0.561  866 


values,  or  that  errors  in  the  density  measurement  have  reduced  the  regression 
coefficients  from  the  values  they  would  take  if  density  were  errorless.  This 
last  alternative  is  not  likely,  since  density  is  accurately  determined  and  is  more- 
over determined  on  the  hardness  test  specimens  themselves. 

The  calculations  for  the  regression  through  the  origin  are  set  out  in  Tables 
3.11,  3.12,  and  3.13.  It  is  seen  that  the  coefficient  b0  is  not  significant.  Hence 
the  regression  through  the  origin  is  used: 

Y  =  4.3*  +  0.562*2. 

It  may  be  noted  here  that  the  test  we  have  carried  out  is  a  test  of  the 
intercept  of  the  regression  line  with  the  Y-axis ;  that  is,  a  test  of  a  possible 
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zero  error  in  hardness.  In  some  applications  it  will  appear  that  any 
departure  from  the  origin  will  be  due  to  a  zero  error  in  the  independent 
variable.     Then  the  same  significance  test  for  departure  from  the  origin  is 


1 

ABLE  3.13 

Analysis  of 

Variance, 

for  Regression  through 

Origin 

D.  F. 

Sum  of  Squares 

Mean  Square 

Regression 

Departure  from  origin 
Residual 

2 

33 
36 

99,354,700 

3,400 

863,500 

3,400(n) 
26,170 

Total 

100,221,600 

(n)  Not  significant 

appropriate,  but  different  fiducial  limits  will  be,  required.  Suppose  that 
the  value  of  x  for  which  Y  is  zero  is  denoted  by  £.  Then  the  sample 
estimate  of  |  is  a  root  of  the  equation 

h?  +  bj  +  b0  =  0 

and  fiducial  limits  for  |  will  be  given  by  the  equation 

(b2?  +  bj  +  b0f  -  /V[f22(£2  -  x^f  +  2f12(|2  -  ^)(|  -  x) 

+  fu(£  -  xf  +  1/w]  =  0, 

which  is  a  biquadratic,  giving  four  values  of  |. 

These  limits  will  have  application  in  an  experiment  in  which,  for 
example,  the  regression  of  some  quantity  on  time  is  being  determined,  but 
the  exact  starting  time  of  the  process  cannot  be  recorded. 

These  methods  may  readily  be  extended  to  the  fitting  of  polynomials  of 
higher  degree. 

3.14    ORTHOGONAL  POLYNOMIALS 

When  an  equation  of  polynomial  form  is  being  fitted  to  data,  it  is 
usually  convenient  to  include  additional  terms  of  successively  higher 
degree  in  the  equation  until  a  satisfactory  fit  has  been  obtained;  if  a 
linear  equation  does  not  fit  satisfactorily,  a  quadratic  is  tried,  then  if 
necessary  a  cubic,  and  so  on.  In  this  process  the  computations  can  be 
considerably  simplified  if  the  regression  variables  are  chosen  to  be  not  the 


46  REGRESSION  ANALYSIS 

successive  powers  of  the  independent  variable  x  but  polynomials  of 
increasing  degree  in  x  which  are  uncorrelated  with  one  another  in  the 
sample.  When  the  independent  variables  are  defined  in  this  way,  the 
analysis  is  simplified  because  (i)  each  regression  coefficient  on  each 
successive  polynomial  may  be  calculated  independently  of  the  others,  as  a 
simple  regression  coefficient  would  be  calculated,  and  (ii)  the  sum  of 
squares  for  regression  attributable  to  each  polynomial  is  likewise  inde- 
pendently calculated  and  represents  the  amount  by  which  the  regression 
sum  of  squares  is  increased  by  the  passage  from  an  equation  of  lower 
degree  to  an  equation  of  the  degree  of  the  polynomial.  Thus,  once  the 
orthogonal  polynomials  are  known,  the  contribution  of  any  power  of 
x  to  the  fit  of  the  regression  can  be  readily  determined. 
For  example,  if  the  values  of  x  were 

12        5        8, 

the  successive  orthogonal  polynomials  could  be  taken  to  be  as  follows: 

Sum  of  Squares 

f i  = -x  -  4 

-3-2  14  30 

£2  -  (2ft2  -  2ft  -  15)/3 

3-1-5  3  44 

ft  =  (55ft3  -  95ft2  -  554ft  +  300)/42 

-9  14  -7  2  330 

It  is  readily  verified  that  these  are  uncorrelated  in  the  sample;    for 
instance, 

S(ftft)  =  3  x  (-9)  +  (-1)  x  14  +  (-5)  x  (-7)  +  3x2  =  0. 

Their  sums  of  squares  are  shown  in  the  right-hand  column.    Thus,  if  the 
corresponding  values  of  y  are 

Vi         2/2  2/s  08» 

the  sums  of  squares  attributable  to  each  increase  of  degree  of  the  regression 
equation  are  as  follows: 

Linear  (-3^  -    2y2  +    y5  +  4^8)2/30 

Quadratic  (    3yx  -     y2-  5y5  +  3«/8)2/44 

Cubic  (-9ft  +  14ya  -  lyb  +  2</8)2/330. 
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Needless  to  say,  the  calculation  of  the  set  of  orthogonal  polynomials  for 
every  fitting  of  a  curve  is  too  laborious  to  be  worthwhile.  Although  the 
numerical  values  of  the  polynomials  just  given  are  simple  enough,  the 
algebraic  expressions  for  them  are  somewhat  cumbersome  and,  of  course, 
become  more  so  as  the  degree  of  the  polynomials  increases.  However, 
for  many  data  the  values  of  x  are  equally  spaced  and  equally  represented, 
and  for  these  the  values  have  been  tabulated.  Fisher  and  Yates  (1953) 
give  the  polynomials  of  degree  up  to  5,  for  numbers  of  values  of  a:  up  to  52. 
This  range  should  be  adequate  for  all  practical  requirements. 

Example  3.4  Relationship  between  the  Bursting  Strength  of  Paper 
and  Its  Basis  Weight.  In  Table  3.14  are  given  values  of  the  mean  bursting 
strength  of  paper  made  from  standard  Ljusnam  kraft  pulp  beaten  for  16,000 
revolutions  in  the  Lampen  mill.  The  paper  was  made  in  nine  different  basis 
weights  ranging  from  10  to  90  grams  per  square  meter,  there  being  three  repli- 
cations of  each.  Although  the  bursting  strength  increases  with  basis  weight, 
it  obviously  does  not  increase  linearly,  so  a  polynomial  curve  is  sought  to  fit 
the  data.  The  orthogonal  polynomials  for  n'  =  9,  taken  from  Fisher  and 
Yates's  Table  XXIII,  are  also  set  out  in  Table  3.14.  The  deviations  from 
regression  are  tested  against  the  estimated  variance  of  means  of  three  values, 
3804  with  16  degrees  of  freedom,  derived  from  the  full  analysis  of  the  experiment. 
The  calculations  are  given  in  the  right-hand  columns  of  the  table.  We  see  that 
deviations  from  a  quadratic  curve  are  highly  significant,  but  that  those  from  a 
cubic  are  not  significant,  so  that  a  cubic  curve  provides  a  satisfactory  fit  to  the 
data. 

In  order  to  use  the  regression  function,  it  may  be  desirable  to  convert  it  into  a 
polynomial  in  x  explicitly.  Fisher  and  Yates,  in  the  Introduction  to  their  tables, 
show  how  this  may  be  done.  However,  for  many  purposes,  it  is  equally 
convenient  to  work  with  the  orthogonal  polynomials  £'.  Thus,  in  the  present 
example,  the  cubic  regression  equation  is 

Y  =  4035.7  +  924.3^'  -  1.5£2'  +  8.7£8'. 

Since  the  values  of  the  I'  are  given,  the  estimated  values  corresponding  to  the 
experimental  points  are  easily  derived;  thus,  corresponding  to  x  =  40,  the 
estimated  value  is 

4035.7  -  1  x  924.3  -  17  x  (-1.5)  -  9  x  8.7  =  3059. 

The  estimates  from  the  fitted  cubic  are  given  at  the  foot  of  Table  3.14. 

The  standard  errors  of  estimates  are  also  readily  derived  in  terms  of  the 
orthogonal  polynomials.    In  this  example, 


l\           £  '2            £  /2             £  /2\ 

V(Y)  =  3804|i  +  ^-  +  -^-  +  ^-| 
K    J      J°^\9  ^60       2772      990/ 

At  a; 

=  40,  Y  = 

=  3059,  the  variance  is 

„n/x  J\        1         289        81  \        11f,A 
3804(9+60  +  2772+99o)=ll94; 

S.E.  (3059)  =  34.6. 

48 
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Two  warnings  should  be  given.  The  ease  with  which  curves  of  high 
degree  may  be  fitted  by  means  of  orthogonal  polynomials  may  encourage 
indiscriminate  curve  fitting.  It  must  always  be  remembered  that,  for 
many  practical  purposes,  a  hand-fitted  curve  is  satisfactory. 

The  other  point  is  that,  because  the  regression  coefficients  and  sums 
of  squares  may  be  determined  separately  for  each  orthogonal  polynomial, 
it  is  easy  to  get  the  impression  that  a  polynomial  term  of  any  degree  may 
be  fitted  on  its  own.  Thus,  in  Example  3.4,  we  might  be  tempted,  since 
the  regression  on  the  quadratic  polynomial  was  not  significant,  to  fit 
only  the  linear  and  cubic  terms.  It  must  be  clearly  remembered  that  the 
polynomial  of  any  given  degree  is  constructed  specifically  to  be  orthogonal 
to  all  lower  powers  of  x,  in  order  that  the  sum  of  squares  for  regression  on 
that  polynomial  may  measure  merely  the  contribution  made  by  increasing 
the  degree  of  the  fitted  curve  to  that  of  the  polynomial.  In  fitting  a  curve, 
all  orthogonal  polynomials  up  to  that  of  the  highest  degree  fitted  must  be 
included,  whether  or  not  their  contributions  are  significant. 

3.15    EQUATIONS  WITH  COEFFICIENTS  SUBJECT 
TO  LINEAR  RESTRICTIONS 

Sometimes  it  is  necessary  to  fit  a  regression  relation  in  which  the 
coefficients  are  restricted  in  order  to  satisfy  some  conditions.  The 
simplest  case,  which  has  been  dealt  with  already,  is  that  in  which  one  of 
the  coefficients  takes  a  fixed  value;  if  the  given  value  is  zero,  the  variable 
corresponding  to  it  is  simply  omitted  from  consideration ;  if  the  constant 
term  is  to  be  zero,  so  that  the  line  (or,  in  general,  the  surface  of  higher 
dimension)  passes  through  the  origin,  the  correction  for  the  mean  is 
omitted.  In  testing  hypothetical  values  of  a  set  of  the  coefficients,  we 
determine,  in  effect,  the  regression  in  which  these  coefficients  are  given 
their  hypothetical  values. 

More  complicated  cases,  although  they  do  not  occur  frequently,  are 
sufficiently  important  to  be  worth  discussion.  Several  of  the  coefficients 
may  have  to  satisfy  a  condition,  which  may  be  either  linear  or  nonlinear. 
An  example  of  coefficients  required  to  satisfy  a  nonlinear  condition  arises 
in  the  fitting  of  concurrent  regression  lines,  which  is  discussed  in  Chapter  8. 
More  usually,  the  condition  is  linear,  and  only  linear  restrictions  will  be 
considered  here. 

As  an  example  of  a  linear  restriction,  consider  the  estimation  of  the 
results  of  a  destructive  test  from  tests  on  neighboring  specimens.  In 
experiments  in  which  it  is  required  to  estimate,  say,  the  ultimate  failing 
load  of  a  specimen  in  order  to  test  it  under  a  reduced  load,  tests  must 
perforce  be  made  on  adjacent  specimens  from  the  same  material.     Since 
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the  material  is  presumed  to  be  homogeneous,  it  is  natural  to  adopt  an 
estimating  equation,  based  on  p  neighboring  specimens,  of  the  form 

r  =  v*i  +  V*2  +  •  •  •  +  V**» 

where  V  +  b2  +  •  •  •  +  bP'  =  1. 

The  xt  are  the  values  of  ultimate  failing  load  in  specimens  located  in  the 
ith  position  in  relation  to  the  test  specimen. 

To  determine  the  constants  of  this  equation  or  to  test  the  concordance 
of  a  given  form  of  relationship  with  the  data,  tests  must,  in  the  first  place, 
be  made  on  the  p  +  1  specimens  from  a  piece  of  material,  including  that 
at  the  location  to  be  estimated.  Subsequent  tests  to  use  the  relationship 
would  be  made  on  only  p  specimens. 

A  further  restriction  would  be  made  if  the  test  specimen  (giving  the 
result  y)  were  placed  centrally  with  respect  to  the  other  specimens,  in 
either  one,  two,  or  three  dimensions.  Then,  by  symmetry,  the  results  for 
specimens  in  equivalent  positions  would  be  given  equal  coefficients.  Such 
a  case  is  illustrated  in  the  accompanying  diagram.    These  restrictions, 

Two-Dimensional  Symmetry 
b1        b2         b± 

b3       (y)       h 
b1        b2         b± 

however,  cause  no  difficulty,  since  their  effect  is  to  reduce  the  number  of 
variables  effectively  to  the  number  with  different  coefficients.  Thus,  in 
this  diagram,  one  independent  variable  could  be  taken  as  the  mean  (or 
total)  of  results  for  the  four  corner  elements,  another  as  the  mean  for  the 
top  and  bottom  center  elements,  and  so  on. 

In  general,  the  linear  restriction  will  not  be  so  simple  in  form.  We 
shall  denote  the  unrestricted  regression  coefficients  by  bt  and  the  restricted 
coefficients  by  b/.  We  shall  assume  that  a  p-variable  regression  is  to  be 
determined,  the  coefficients  being  subject  to  the  restriction 

a0b0'  +  aj^  +  '•'  +  aPbP'  =  k.  (3.8) 

We  may,  in  some  cases,  use  the  relation  (3.8)  to  eliminate  one  of  the 
unknown  coefficients  and  then  solve  for  the  others.  A  more  general 
method,  which  has  the  advantage  of  preserving  the  symmetry  of  the 
equations,  is  to  introduce  the  Lagrangian  multiplier  X  and  to  proceed 
regardless  of  the  restriction.    The  quantity  to  be  minimized  is  then 

S(y  -  b0'  -  V*i bv'xpf  +  2X(a0b0'  +  a&  +  ■  •  •  +  aj>9'), 

leading  to  the  normal  equations 

Whi  +  b2'ti2  +  •  •  •  +  bv'tiv  =Pi-  Hfii  ~  <*<Fi)-  (3-9) 


MULTIPLE  AND   POLYNOMIAL  REGRESSION  51 

In  most  practical  examples  a0  equals  zero,  there  being  no  restriction  on 
the  constant  term  in  the  equation.  When  a0  is  not  zero,  one  possible 
procedure  is  to  treat  1  as  an  independent  variable  and  work  with  un- 
corrected sums  of  squares  and  products ;  alternatively,  if  corrected  sums 
of  squares  and  products  are  used,  the  ai  need  to  be  corrected  also,  the 
corrected  values  being 

a/  =  at  -  a0xiy 

as  shown  in  (3.9).  Henceforth  we  shall  assume,  without  any  loss  of 
generality,  that  a0  =  0. 

From  the  equations  (3.9)  the  regression  coefficients  are  found  to  be 

i 

=  h  -  Xqn,  (3.10) 

where  ?ft  =  2<Mw- 

i 

Multiplying  (3.10)  by  ah  and  summing  with  respect  to  h,  we  find 

i  i 

so  that  l  =  (2  ajbt  -  k)fe  a^ 

i  i 

and  bh'  =  bh-qh  (2  «A  -  &)/2  axqt. 

i  i 

It  is  readily  confirmed  that  the  restricted  coefficients  satisfy  the  condition 
(3.8). 
The  residual  sum  of  squares  is 

u  -  2  biPi  -  M 

i 

and  the  regression  sum  of  squares  is  accordingly 

2b;pt  +  M. 

i 

Alternatively,  the  effect  of  the  restriction  must  be  to  reduce  the  regression 
sum  of  squares  by  an  amount  representing  the  departure  of  the  unrestricted 
coefficients  from  the  condition  required ;  this  deduction  from  the  sum  of 
squares  is 

i 

the  residual  sum  of  squares  is  increased  by  the  same  amount. 
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It  is  advantageous  to  express  these  results  in  terms  of  the  unrestricted 
regression  coefficients,  because  it  will  often  be  necessary  to  test  the  effect 
of  the  restriction  in  this  way.  The  easiest  way  to  determine  the  variances 
and  covariances  of  the  restricted  regression  coefficients  is  to  recognize 
that  they  are  linear  combinations  of  the  unrestricted  coefficients.  We 
have  in  particular 

i 

and  Cov  (bif  X)  =  s%j^  tftf *, 

i 

from  which  we  find 

V(.bt')  =  s*(r- qf/Zaa) 

i 

cov  (v,  */)-*<* -«*£«*)• 

i 

These  formulas  are  similar  to  those  obtained  when  one  independent 
variable  is  eliminated  from  a  multiple  regression.  Indeed,  the  effect  of 
the  restriction  is  equivalent  to  that  of  eliminating  from  the  regression  the 
independent  variable 

X=  2*«fc/2fl#<- 

i  i 

The  sum  of  products  of  y  with  X  is 

2/\ft/2fl<ft  =  2«A/2flW< 

i  %  i  i 

and  the  sum  of  squares  of  X  is  1/2  a^>  so  tnat  tne  simple  regression 
coefficient  of  y  on  X  is  ' 

i 

The  sum  of  squares  for  the  restriction  is  also  the  sum  of  squares  for 
departure  of  this  regression  coefficient  from  its  hypothetical  value  k. 

Sometimes  more  than  one  restriction  is  imposed  on  the  regression 
coefficients;  the  discussion  just  given  may  be  generalized  to  such  cases 
but  need  not  be  considered  here,  since  no  new  principle  is  involved. 

3.16    EXAMPLE  OF  A  RESTRICTED  REGRESSION 
RELATION 

Example  3.5  Estimation  of  Test  Values  from  Values  for  Neigh- 
boring Specimens.  In  an  experiment  planned  to  study  the  effect  of  duration 
of  loading  on  small  compression  specimens  of  wood,  it  was  decided  to  estimate 
their  maximum  short-time  load  and  to  apply  a  load  which  was  a  given  fraction 
of  the  maximum.  The  estimates  were  derived  from  results  for  adjacent 
specimens  used  as  controls,  which  were  tested  to  failure  in  the  usual  way. 
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From  each  of  a  number  of  boards  five  end-matched  specimens  were  taken,  the 
specimens  at  the  ends  and  middle  being  used  as  controls.  In  order  to  derive 
the  regression  equation,  a  series  of  determinations  of  maximum  load  on  all 
five  specimens  from  114  boards  was  also  made.  The  designation  of  the  depen- 
dent variable  (y)  and  the  independent  variables  (xlf  x2,  and  x3)  is  shown  in  the 
following  diagram: 


and  x3    -    x2    y    xx. 

Each  board  provides  two  sets  of  values,  as  the  diagram  shows ;  the  values  of 
xx  and  x3  have  to  be  interchanged  to  correspond  to  the  two  values  of  y.  In 
order  to  ensure  that  the  estimator  is  homogeneous  in  the  xit  and  that  its  mean 
equals  the  mean  of  y,  we  take 

V  =  o 


and 
so  that 


V  +  b2'  +  Z>3'  =  1, 

k  =  1. 


ai  =  a9  =  a-x 


Since  the  data  are  rather  extensive,  they  are  not  given  here,  but  the  relevant  sums 
of  squares  and  products  (uncorrected  for  the  mean)  are  set  out  in  Table  3.15. 


TABLE  3.15 

Sums  of  Squares  and  Products  (Uncorrected)  of 

Maximum  Compressive  Strength  Values 


106  x 


12,061.5343 
12,030.9275 
12,032.6124 


12,030.9275 
12,047.2756 
12,030.9275 


12,032.6124" 
12,030.9275 
12,061.5343 


12,049.1993 
12,041.8576 
12,042.3124 
12,069.7072 


In  this  example  the  corresponding  values  of  all  the  variables  are  close 
together.  The  sums  of  squares  and  products  uncorrected  for  the  mean 
are  therefore  large  and  nearly  equal,  and  their  matrix  is  "almost  singular," 
so  that  its  inverse  cannot  readily  be  determined  accurately.  We  can, 
however,  reduce  the  size  of  the  elements  to  give  a  matrix  more  amenable 
to  calculation  by  means  of  a  simple  transformation  of  the  variables.  If  we 
put 


y  -  x1 


y  -  x2 


and 


y  -  x2 


the  equation 


becomes 


Y'  =  b1'x1  +  b2'x2  +  b^x, 
y-Y'  =  V%  +  b2\  +  b3' 
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The.equation  in  this  form  shows  that  6/,  b2',  and  63'  are  to  be  chosen  to 
minimize 

S(y  -  Y'f  =  Sib^  +  bt\  +  b3'z3f, 

subject  to  the  condition 

W  +  V  +  V  =  1. 

The  uncorrected  sums  of  squares  and  products  of  the  zt  are  set  out  in  Table 
3.16.    If  these  are  denoted  by  vhi,  the  quantity  to  be  minimized  is 

52  h'biVM  ~  2A£  b/. 

hi  i 


TABLE  3.16 

Sums  of  Squares  and  Products  of  New  Variables, 

Derived  from  Those  of  Table  3.15 


32.8429 

9.5778 

10.8079 

9.5778 

33.2676 

16.4647 

10.8079 

16.4647 

46.6167 

106  x 


The  normal  equations  are  accordingly 

2  bi'Vhi   =  A: 

i 

giving  for  the  regression  coefficients 

Hence  X  =  l/JJ©* 

and 


h    i 


i  hi 

The  inverse  matrix  and  the  regression  coefficients  are  given  in  Table  3.17. 


TABLE  3.17 

Inverse  Matrix,  Residual  Sum  of  Squares,  and  Regression 

Coefficients 

Sum  bi 


10" 


34,330.80  -7,203.87  -5,415.10 
-7,203.87  37,938.35  -11,729.38 
-5,415.10     -11,729.38        26,849.75 


21,711.83    0.430  601 

19,005.10    0.376  919 

9,705.27    0.192  480 


50,422.20 


Residual  sum  of  squares  (226  D.F.)  =  1012/50,422.20 

=  19.8325  x  106 
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The  residual  sum  of  squares  is 

h    i  i 

h    i 

This  result  is  an  example  of  a  general  theorem  which  we  shall  encounter 
again  in  Chapter  10.  Let  V  be  a  p  x  p  matrix  of  sums  of  squares  and 
products  with  q  degrees  of  freedom;  then  the  reciprocal  of  the  sum  of  the 
elements  of  its  inverse  is  a  sum  of  squares  with  q  —  p  +  1  degrees  of 
freedom. 

In  this  example,  q  =  228,  p  =  3. 

The  residual  sum  of  squares,  as  is  shown  in  Table  3.17,  is  19.8325  x  106  with 
226  degrees  of  freedom,  giving  a  residual  mean  square  of  0.087  754  x  106. 
The  estimated  variances  of  the  restricted  regression  coefficients  are 

F(V)=^V-Vl^) 

h 

(iyo2" 

i h 

h   i 

Thus,  K(V)  =  0.087  754(34,330.80  -  21,71 1.832/50,422.20)  x  10-6 
=  0.002  192; 

and  similarly 

V(b2')  =0.002  701, 
V(b3')  =  0.002  192. 

Hence  the  standard  errors  of  b±',  b2',  and  bz'  are,  0.047,  0.052,  and  0.047, 
respectively. 

This  method  gives  the  restricted  regression,  which  is  what  is  required  in 
practice.  However,  it  may  be  considered  necessary  to  test  the  significance 
of  the  effect  of  the  restriction,  in  which  case  the  unrestricted  regression 
would  also  have  to  be  fitted.  In  order  to  do  this,  an  alternative  trans- 
formation may  be  used. 

The  restricted  regression  of  y  on  xl9  x2,  and  xz  is  equivalent  to  the 
unrestricted  regression  of 

y  —  xz        on        xx  —  x3    and    x2  —  x3; 
that  is  to  say, 

Y'  —  a?3  =  b1f(x1  -  x3)  +  b2(x2  -  x3) 
is  equivalent  to 

Y'  =  b1'x1  +  b2x2  +  bB'x3. 

Also  the  unrestricted  regression  of  y  on  xx,  x2,  and  xs  is  equivalent  to  the 
regression  of 

y  —  xz        on        xx  —  £3,    x2  —  #3,    and    x3. 
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Hence,  in  this  last  regression,  the  partial  regression  of  y  —  xz  on  xz 
provides  a  test  of  the  effect  of  the  restriction. 

The  transformation  to  the  new  variables  has  been  made,  reducing  the  sums 
of  squares  and  products  to  more  convenient  quantities;  these  are  given  in 
Table  3.18.    The  inverse  matrix  for  the  three  variables  is  given  in  Table  3.19. 


TABLE  3.18 

Sums  of  Squares  and  Products  of  Alternative  Variables 

Derived  from  Those  of  Table  3.15 


106  x 


xi  -  oc3 

a?2       ^3 

xz 

V  ~H 

57.8438 

28.9219 

-28.9219" 

35.8088 

28.9219 

46.9549 

-30.6068 

30.1520 

-28.9219 

-30.6068 

12,061.5343 

-19.2219 
46.6167 

TABLE  3.19 

Inverse  Matrix  and  Regression  Coefficients  for 

Variables  of  Table  3.18 


10" 


24,986.93  -15,377.10  20.895 

■15,377.10  30,795.47  41.273 

20.895  41.273        83.0630 


0.430  700 
0.377  116 
0.000  396 


The  regression  coefficients  (unrestricted)  are  those  that  would  be  given  by 
these  calculations,  except  that  b3  is  now  replaced  by  b3*  •=  b1  +  b2  +  b3  —  1. 
For  the  significance  of  the  effect  of  the  restriction,  the  sum  of  squares  is 

63*2/,33  =  1900j 

which  is  not  significant.  The  significance  is  determined  from  the  analysis  of 
variance  shown  in  Table  3.20. 

Hence,  to  accord  with  logical  requirements,  the  restriction  that  the  sum  of 
the  coefficients  equals  unity  is  retained;  the  restricted  coefficients  may  also  be 
calculated  by  means  of  the  formula  for  the  omission  of  an  independent  variable : 

bx'  =  0.430  700  -  20.985  x  0.000  396/83.0360 

=  0.430  600; 
b2'  =  0.377  116  -  41.273  x  0.000  396/83.0360 

=  0.376  919; 


by  subtraction, 


bo'  =0.192  481. 
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Hence  the  estimation  formula  is 

T  =  0.431^  +  0.377x2  +  0.192x3, 
agreeing  with  the  result  found  earlier. 
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TABLE  3.20 
Analysis  of  Variance  of  y  —  x3 


Regression  on  xlf  x2,  and  x3 
Residual 

D.F. 

3 

225 

228 

1 
2 

3 

Sum  of  Squares 

26,786,000 
19,830,700 

Mean  Square 
88,140 

Total 

Due  to  b3* 
Restricted  regression 

46,616,700 

1,900 
26,784,100 

l,900(n) 

Unrestricted  regression 
{n)  Not  significant 

26,786,000 

As  a  simple  example  of  a  regression  relationship  with  two  restrictions  we  may 
consider  the  calculation  of  an  equation  of  estimation  independent  of  any  linear 
trend  in  the  values.    It  can  be  verified  that  this  would  require  the  restriction 

-V  +  V  +  3  V  =  0 

on  the  coefficients,  in  addition  to  the  original  restriction.  On  account  of  the 
restrictions  there  is  effectively  now  but  one  regression  coefficient,  and  the 
analysis  could  be  carried  out  as  the  simple  regression  (through  the  origin)  of 

y  -  Wi  +  *2) 


on 


Zx2  +  Xo 


both  new  variates  being  adjusted  for  mean  and  trend;   the  regression  coefficient 
is  b3".    The  other  two  regression  coefficients  are  then  found  as 

V  =  i  +  V 
b2"  =  |  —  2b/. 

The  sums  of  squares  and  products  are  then 

S[y  -  iOi  +  *2)]2  =    21.3165 

S[y  -  K*i  +  **!  -  2*2  +  x3]  =  -0.2022 

S[x1  -  2x2  +  x3f  =  129.9758 

b3"  =  -0.001  556 
b{  =      0.498  444 
b2"  =     0.503  112 
Y"  =  0A98x1  +  0.503^2  -  0.002x3. 
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The  analysis  of  variance  given  in  Table  3.21  shows  that  the  effect  of  this 
second  restriction  is  significant  at  the  1  per  cent  level;  if  it  is  necessary  to  make 
an  allowance  for  trend,  some  worsening  of  the  fit  may  be  expected. 


TABLE  3.21 
Analysis  of  Variance  of  y  —  \{xx  +  x2) 


Regression  on  x1  —  2x2  +  xs 
Residual 

D.F. 

1 

227 

228 

1 
1 

225 

227 

Sum  of  Squares 

300 
21,316,200 

Mean  Square 

Total 

Effect  of  first  restriction 
Second  restriction 
Residual  from  Table  3.20 

21,316,500 

1,900 

1,483,600 

19,830,700 

1,483,600** 
88,140 

Residual  from  above 

**  Significant  at  1  per  cent  level 

21,316,200 

CHAPTER    4 


Regression  Equations 
Requiring  Iterative  Calculation 


4.1    GENERAL 

The  regression  equations  considered  so  far  have  been  of  a  simple  type: 
they  have  all  been  linear  in  the  parameters  to  be  estimated,  or  can  be 
reduced  to  linear  form.  This  has  meant  that  the  equations  of  estimation 
of  the  parameters  have  also  been  linear,  and  also  that  exact  significance 
tests  based  on  the  assumption  of  normal  variation  have  been  applicable. 
However,  it  is  often  necessary  to  fit  regression  equations  that  are  not 
linear  in  their  parameters;  for  these,  both  the  estimation  of  the  para- 
meters and  the  tests  of  significance  are  more  difficult,  and,  in  particular, 
exact  significance  tests  are  not  possible.  In  this  chapter,  iterative  methods 
of  calculation  suitable  for  such  equations  and  approximate  tests  of 
significance  for  the  values  of  the  parameters  will  be  presented. 

As  an  example  of  a  nonlinear  equation  which  is  reducible  to  linear  form, 
consider  the  following : 

Y  =  bx(x  -  c)  +  b2(x  -  c)2, 

which  is  nonlinear  in  c.  The  equation  might  be  written  in  this  form  if 
the  value  of  x  for  which  Y  is  zero,  here  denoted  by  c,  were  of  particular 
interest.    This  equation  is,  however,  linear  in 

b2c2  —  bxc,        2b2c  —  bl9        and  b2 ; 

for  it  may  be  written  in  the  form 

Y  =  b2c2  —  bxc  —  (2b2c  —  b±)x  +  b2x2. 

Consequently,  these  three  quantities  may  be  estimated  by  the  general 
methods  for  multiple  linear  regression,  and  joint  fiducial  limits  for  their 
true  values  determined.  The  question  of  determining  fiducial  limits  for  c 
will  be  discussed  in  Section  6.10. 
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On  the  other  hand,  the  apparently  simpler  regression  equation 

Y  =  b2(x  -  cf 

cannot  be  reduced  to  a  form  linear  in  two  parameters,  and  the  methods  of 
this  chapter  need  to  be  applied  in  fitting  it. 

The  simplicity  of  the  estimation  and  testing  of  regressions  linear  in 
their  parameters  arises  from  the  fact  that  (assuming,  as  usual,  normal 
variation)  there  exists  a  set  of  jointly  sufficient  statistics  for  the  parameters; 
this  makes  immediately  applicable  an  analysis  of  variance,  corresponding 
to  the  factorization  of  the  probability  into  a  part  depending  on  the 
parameters  and  another  part  independent  of  them. 

Only  one  general  type  of  regression  equation  will  be  considered  in  this 
chapter,  nameiy  one  that  is  nonlinear  in  one  parameter,  the  most  commonly 
occurring  case. 

The  general  form  considered  will  be 

Y  =  b0  +  bjix,  c), 

where  f  is  some  nonlinear  function  of  c.  Typical  examples  of  frequent 
occurrence  are 

(i)    f(x,  c)  =  ecx        (and  variants  rx,  xc) 

(ii)    f(x,  c)  =  log  (x  -  c) 

(iii)    fix,  c)  =  (x  —  cY,        where  t  is  known,  and  not  unity ;  in 

particular  t  =  —  1 . 

The  extensions  to  equations  that  are  nonlinear  in  two  or  more  parameters, 
for  example, 

Y^bo  +  b^x-cy* 

involve  no  new  principle. 

4.2    ESTIMATION  OF  THE  PARAMETERS 

In  order  to  calculate  the  parameters  by  iterative  methods,  it  is  necessary 
first  to  choose  some  approximate  value  of  c,  so  that  the  values  of/  =  f(x,  c) 
can  be  calculated.  An  approximate  value  may  be  determined  graphically 
or  by  other  methods;  no  general  method  can  be  prescribed,  but  some 
methods  will  be  demonstrated  in  working  out  the  examples  given  in  this 
chapter.  Hartley  (1948)  has  given  some  approximate  methods  that  are 
suitable  in  certain  cases. 
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If  the  adjustment  to  the  preliminary  estimate  of  c  is  dc,  an  improved 
value  off  is  given  by 

/+  dcf, 

where  /'  =  dfjdc.  Then  the  regression  equation  becomes,  to  a  first 
approximation, 

Y^bo  +  btf+dcf) 
=  bo  +  blf+  b2f>9 

where  b2  =  b^c. 

This  is  a  linear  regression  on/and/',  in  which  the  ratio  of  the  coefficients 
b2  and  bx  gives  the  adjustment  to  c.  Once  the  adjusted  value  of  c  has 
been  determined,  new  values  of/ and/'  need  to  be  calculated,  and  the 
regression  must  be  calculated  again.  Usually,  if  the  initial  estimate  of  c 
has  been  satisfactory,  one  or  two  cycles  of  iteration  suffice  to  give  all  the 
accuracy  required. 

The  standard  errors  of  the  coefficients  may  be  determined  in  exactly 
the  same  way  as  for  multiple  linear  regression,  but  it  must  be  remembered 
that  the  standard  errors  are  approximate  only.  They  are  probably 
satisfactory  for  rough  significance  tests;  more  exact  tests  have  been 
proposed  by  Hotelling  (1939)  and  developed  for  the  particular  case  of 
exponential  regression  by  Keeping  (1951),  but  the  increase  in  complexity 
of  these  tests  is  probably  not  warranted  by  the  gain  in  accuracy. 

A  simple  test  of  the  concordance  of  a  given  value  of  c  with  the  data  is 
given  by  the  significance  of  b2,  the  partial  regression  coefficient  of  y  on/'. 
This  is  an  exact  test  since,  as  the  null  hypothesis  defines  the  value  of  c,  the 
regression  is  a  linear  regression  on  two  given  functions /and/'.  There 
may  be  more  powerful  tests,  however,  since  the  linear  regression  makes  no 
allowance  for  the  nonlinearity  of/  as  a  function  of  c. 

This  test  is  to  be  distinguished  from  the  test  of  significance  of  the 
regression  of  y  on  /  This  is,  of  course,  tested  by  the  significance  of  b± ; 
as  a  total  regression  coefficient,  if  c  is  specified,  or  as  a  partial  regression 
coefficient,  if  c  is  not  specified  but  determined  from  the  data.  This  latter 
test  is,  however,  not  exact. 

Again,  as  for  multiple  linear  regression,  an  analysis  of  variance  may  be 
set  up,  to  give  an  approximate  over-all  test  of  the  regression. 

If  s2  is  the  residual  mean  square  for  y,  and  (thi)  is  the  inverse  matrix  of 
sums  of  squares  and  products  of/ and/',  then,  approximately, 

VibJ  =  s2fn, 
and  V(b2)  =  b?V(c) 

=  s2t2\ 
so  that  V(c)  =  52r22/V- 
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4.3    FITTING  AN  EXPONENTIAL  REGRESSION 

In  example  (i)  from  Section  4.1,  three  different  forms  of  exponential 
regression  function,  which  can  be  reduced  to  the  same  form  by  various 
transformations  of  parameter  or  independent  variable,  were  given.  The 
first  form  is  generally  to  be  preferred,  provided  a  table  of  ex  is  available,  as 
for  this  form 

/'  =  */> 

so  that  once  values  off  have  been  tabulated,  values  off  can  be  written 
down  with  a  minimum  of  calculation.  For  this  form  of  regression 
function,  Stevens  (1951)  and  Pimentel-Gomes  (1953)  have  prepared  tables 
which  enable  the  calculations  to  be  expeditiously  carried  out  if  based  on 
equally  spaced  values  of  x.  However,  to  illustrate  the  general  principles, 
we  give  here  an  example  in  which  the  data  are  unequally  spaced,  so  that 
Stevens'  and  Pimentel-Gomes'  tables  are  not  applicable.  When  data  are 
equally  spaced,  it  is  advantageous  to  use  Pimentel-Gomes'  method,  to 
which  reference  should  be  made. 

Example  4.1  Relation  between  Daily  Molybdenum  Intake  and 
Change  in  Total  Liver  Copper.  Thirty  sheep  were  selected  from  a 
homogeneous  flock  and  divided  at  random  into  six  groups  of  five  animals  each. 
The  animals  were  all  penned  separately,  and  those  of  each  group  were  fed  the 
same  amount  of  molybdenum  in  the  diet,  each  group  receiving  a  different  level 
of  molybdenum.  The  copper  content  of  the  liver  of  each  sheep  was  determined 
by  biopsy,  both  at  the  beginning  of  the  experiment  and  after  27  weeks,  and 
transformed  to  total  liver  copper  by  means  of  an  empirical  equation.  The 
average  daily  molybdenum  intake  and  average  change  in  total  liver  copper  for 
each  group  are  presented  in  Table  4.1.  The  relation  between  molybdenum 
intake  (x)  and  change  in  liver  copper  (y)  was  considered  best  represented  by 
means  of  an  equation  that  gave  a  diminishing  effect  as  the  intake  increased,  so 
the  exponential  relationship 

Y  =  b0  +  he~cx 
was  fitted. 

Inspection  of  the  first  two  columns  of  Table  4.1  showed  that  the  constant  b0 
was  about  —4.8;  hence,  corresponding  to  x  values  of  0.4  and  5.4,  the  values  of 
bxe~cx  are  approximately  56.4  and  24.8.  The  ratio  of  these  two  values  gives 
for  a  first  approximation 

e™  =  56.4/24.8  =  2.274, 
whence  c  —  0.164. 

As  a  trial  value,  c  was  taken  as  0.165.  Values  of /and/'  for  this  value  of  c 
are  tabulated  in  columns  3  and  4  of  Table  4.1,  from  which  the  calculations 
follow.  The  calculations  based  on  the  preliminary  estimate  show  that  the 
standard  error  of  c  is  0.059,  so  that  c  need  be  determined  to  no  more  than  three 
decimal  places.  In  this  example  one  cycle  of  iteration  is  sufficient,  giving  for  c  a 
final  value  of  0.166. 
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TABLE  4.1 

Calculation  of  Exponential  Regression  of  Change  in  Total  Liver 

Copper,  y  (mg),  on  Daily  Molybdenum  Intake,  x  (mg  per  day) 

f  =  e~cx       f  =  —xe~cx 

x  y  c  =  0.165  c  =  0.166 

(mean  of  5)  /  /'  /  /' 


0.4 

51.6 

0.9361 

-0.374 

0.9358 

-0.374 

1.4 

53.4 

0.7937 

-1.111 

0.7926 

-1.110 

5.4 

20.0 

0.4102 

-2.215 

0.4080 

-2.203 

19.5 

-4.2 

0.0401 

-0.781 

0.0393 

-0.766 

48.2 

-3.0 

0.0004 

-0.017 

0.0003 

-0.016 

95.9 

-4.8 

0.0000 

-0.000 

0.0000 

-0.000 

Total  113.0 

Sum  of 
squares      3,835.63 


2.1805 


-4.498 


2.1760 


-4.469 


Calculation  of  first  adjustment 


/ 
0.883  685 
•0.537  17 


/' 
-0.537  17" 

3.518  7 


57.6546 
-34.882 


Inverse  matrix 
T1.247  4 
0.190  43 


0.190  43  ' 
0.313  267 


bt  =  65.276 

b9  =    0.0518  Sc  =  0.0008 


Calculation  of  second  adjustment 


/ 

r    0.882  782 
[-0.537  95 

-0.537  95" 

3.483  5  _ 

y 

57.6248 
-35.201 

verse  matrix 
|"1.250  5 
[o.l93  11 

0.193  11  1 
0.316  889J 

b2  = 

65.262 
-0.0269    dc  = 

-0.0004 

Regression  sum  of  squares  =  57.6248^  —  35.20162 
=  3,761.66 
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The  analysis  of  variance  based  on  the  final  estimate  of  c  is  shown  in  Table  4.2. 
The  sums  of  squares  for  groups  and  for  regression  are  derivable  from  the  data 
presented  here ;  the  sum  of  squares  for  error  comes  from  the  full  analysis  of  the 
individual  experimental  results.  The  analysis  shows  that  deviations  from  the 
fitted  regression  are  not  significant. 


TABLE  4.2 
Analysis  of  Variance  of  Change  in  Total  Liver  Copper 

D.F.         Sum  of  Squares       Mean  Square 

Regression  2  3761.66 

Deviation  from  regression  3  73.97  24.66(n) 


Total  between  groups  5  3835.63 

Error  (within  groups)  24  1 108.80  46.20 

in)  Not  significant 


The  standard  errors  of  bx  and  b2  are  determined  in  the  usual  way  from  the 
elements  of  the  inverse  matrix.     Thus 

S.E.  of  bx  =  V(46.20  x  1.2505)  =  7.60, 
and  S.E.  of  b2  =  V(46.20  x  0.3169)  =  3.83. 

Hence  S.E.  of  c   =  3.S3/b1  =  0.059. 

The  regression  equation  is  then 

Y  =  -4.85  +  65.3e-°-166a;. 

4.4    FITTING  A  HYPERBOLIC  REGRESSION 

In  Example  (iii)  listed  in  Section  4.1,  the  case  when  t  =  —  1  gives  a 
hyperbola.  The  hyperbolic  curve  gives  infinite  values  for  Fwhen  x  =  c; 
but  usually,  for  observational  data,  all  the  values  of  x  are  on  the  one  side 
of,  and  not  close  to,  c.  If  values  of  x  close  to  c  occur,  so  that  very  large 
values  of  Y  (and  hence  of  y)  result,  it  may  not  be  possible  to  maintain  the 
assumption  that  the  values  of  y  are  homogeneous  in  their  error  variation. 
For  the  present  discussion  we  shall  assume  that  all  the  values  being 
considered  are  not  too  close  to  x  =  c,  and  that  the  errors  are  homo- 
geneous. 

The  curve  may  be  fitted  by  the  general  methods  outlined,  once  a 
preliminary  estimate  of  c  has  been  chosen.  Had  we  considered  the 
possibility  of  points  in  the  neighborhood  of  x  =  c  being  observed,  a 
good  preliminary  estimate  of  c  would  be  some  value  of  x  in  the  neighbor- 
hood of  which  large  values  of  y  were  occurring.     As  this  possibility  has 
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been  specifically  ruled  out  from  the  present  discussion  on  account  of  the 
other  difficulties  it  raises,  we  need  to  find  other  ways  of  estimating  c. 
If,  as  sometimes  happens,  it  is  known  that  b0  is  zero,  the  regression 
equation  can  be  written 

y-i  =  (a.  _  c)jbim 

Consequently,  if  the  reciprocal  of  y  is  plotted  against  x  and  a  line  drawn 
through  the  points  by  eye,  the  intercept  of  the  line  on  the  X-axis  gives  the 
first  estimate  of  c.  Alternatively,  the  regression  equation  of  y~x  on  x 
may  be  calculated,  but  this  will  usually  be  an  unnecessarily  elaborate 
means  of  obtaining  c. 

If  bQ  is  not  zero,  the  equation  may  be  rewritten 

x  Y  =  b0x  +  cY  —  b0c  +  bx\ 

hence,  a  first  approximation  to  c  may  be  found  as  the  coefficient  of  y  in  the 
regression  of  xy  on  x  and  y.  If  this  is  considered  too  laborious  a  means  of 
obtaining  a  preliminary  estimate,  a  value  must  be  in  some  way  guessed 
from  a  plotted  curve,  or  from  previous  knowledge  of  the  particular 
problem. 

Example  4.2  Calibration  of  a  Stormer  Viscometer.  The  Stormer 
rotational  viscometer  measures  the  viscosity  of  a  liquid  in  terms  of  the  time 
taken  for  a  given  number  of  revolutions  of  its  inner  cylinder.  The  actuating 
force  is  provided  by  a  weight  which  may  be  varied.  Theoretical  considerations 
indicate  a  relationship  between  viscosity  and  time,  of  the  form 

rj  =  (k^  -  k2)t, 

where  rj  is  the  viscosity,  w  is  the  actuating  weight,  and  /  is  the  time.  In  calibrating 
the  viscometer,  liquids  of  known  viscosity  are  used  and  the  time  for  100  revo- 
lutions recorded,  so  that  viscosity  here  becomes  the  independent  variable,  and 
time  the  dependent.  Also,  different  weights  (20,  50,  and  100  grams)  are  used, 
so  that  there  are  two  independent  variables,  but  there  is  still  only  one  "nonlinear" 
parameter  to  be  estimated.    The  regression  equation  may  be  written 

Y  =  b1x1l(x2  -  c), 

where  now  xx  is  viscosity,  x2  is  weight,  and  y  is  time. 
Putting 

/  =  a?i/(z2  -  c), 

we  have  /'  =  xj(x2  —  c)2; 

we  now  have  to  determine  a  regression  on  f  and  /',  but  without  any  constant 
term,  so  uncorrected  sums  of  squares  and  products  are  used. 

The  data  of  a  calibration  experiment  are  presented  in  Table  4.3;  viscosities 
are  in  centipoises,  weights  in  grams,  and  times  in  seconds.  The  necessary 
calculations  are  shown  in  and  at  the  foot  of  the  same  table.     Only  one  cycle  of 


66 


REGRESSION  ANALYSIS 


TABLE  4.3 

Calibration 

of  a  Stormer  Viscometer 

f  —        1 

/ 

xi 

X2  -  C 

(»a  - 

cf 

c  = 

2.6 

c  = 

2.23 

xl 

x2 

y 

/ 

/' 

I 

/' 

14.7 

20 

35.6 

0.845 

0.0486 

0.827 

0.0466 

158.3 

20 

270.0 

9.098 

0.5229 

8.908 

0.5013 

89.7 

20 

150.8 

5.155 

0.2963 

5.048 

0.2841 

75.7 

20 

121.2 

4.351 

0.2500 

4.260 

0.2397 

146.6 

20 

229.0 

8.425 

0.4842 

8.250 

0.4643 

27.5 

20 

54.3 

1.580 

0.0908 

1.548 

0.0871 

42.0 

20 

75.6 

2.414 

0.1387 

2.364 

0.1330 

14.7 

50 

17.6 

0.310 

0.0065 

0.308 

0.0064 

298.3 

50 

187.2 

6.293 

0.1328 

6.245 

0.1307 

158.3 

50 

101.1 

3.340 

0.0705 

3.314 

0.0694 

89.7 

50 

58.3 

1.892 

0.0399 

1.878 

0.0393 

75.7 

50 

47.2 

1.597 

0.0337 

1.585 

0.0332 

161.1 

50 

92.2 

3.399 

0.0717 

3.372 

0.0706 

146.6 

50 

85.6 

3.093 

0.0652 

3.069 

0.0642 

27.5 

50 

24.3 

0.580 

0.0122 

0.576 

0.0121 

42.0 

50 

31.4 

0.886 

0.0187 

0.879 

0.0184 

298.3 

100 

89.0 

3.063 

0.0314 

3.051 

0.0312 

158.3 

100 

50.3 

1.625 

0.0167 

1.619 

0.0166 

89.7 

100 

30.0 

0.921 

0.0095 

0.917 

0.0094 

75.7 

100 

24.6 

0.777 

0.0080 

0.774 

0.0079 

161.1 

100 

45.1 

1.654 

0.0170 

1.648 

0.0169 

146.6 

100 

41.7 

1.505 

0.0155 

1.499 

0.0153 

298.3 

100 

86.5 

3.063 

0.0314 

3.051 

0.0312 

2796.4         1290  1948.6 

Sum  of  squares  264,520 

Calculation  of  adjustment 


/ 
T315.379  4 
L  13.928  68 
Inverse  matrix 

p    0.020  734  4 
-0.397  684 


/' 

13.928  68  " 

0.726  213 


-0.397  684' 
9.004  54 


65.866 


y 

9119.191 
401.5331 


2.4122 


64.990 


bx  =  29.3977** 
b2  =  — 10.935(«> 


2.3289 


8c 


•0.372 


ln)  Not  significant 

**  Significant  at  1  per  cent  level 


EQUATIONS  REQUIRING  ITERATIVE   CALCULATION  67 

calculations  is  shown;  examination  of  the  standard  errors  shows  that  one  cycle 
gives  all  the  accuracy  required.  A  further  cycle  of  iteration  does,  in  fact, 
alter  the  value  of  c  to  2.22.  The  analysis  of  variance  is  given  in  Table  4.4, 
from  which  the  standard  errors  are  found  to  be 

S.E.  of  b1  =  V(39.4  x  0.0207)  =    0.903 
S.E.  of  b2  =  V(39.4  x  9.00)      =  18.8 
S.E.  of  c    =  18.8/6!  =    0.64. 


TABLE  4.4 
Analysis  of  Variance  of  Results  in  Table  4.3 


Regression 
Residual 


D.F. 

2 
21 

Sum  of  Squares 

263,692 
828 

Mean  Square 
39.4 

Total  (uncorrected  for  mean)  23  264,520 


The  regression  equation  is 

Y  =  29.40^/(^2  -  2.22). 

The  value  2.22,  whose  departure  from  zero  is  highly  significant,  may  be  inter- 
preted as  the  weight  in  grams  required  to  overcome  the  internal  friction  of  the 
viscometer. 

4.5    LINEAR  REGRESSION  WITH  ESTIMATED 
WEIGHTS 

When  the  values  of  y  corresponding  to  different  values  of  x  are  subject 
to  errors  of  differing  variance,  it  is  more  efficient  to  use  a  weighted  analysis 
than  to  analyze  the  data  without  weighting.  The  weight  for  any  value  of 
y  will  be  inversely  proportional  to  its  variance.  When  the  variances  are 
unknown,  they  may  be  estimated  from  the  data.  This  is  practicable  if, 
for  example,  there  are  corresponding  to  each  value  of  x  a  sufficient  number 
of  values  of  y.  Alternatively,  and  more  commonly,  it  may  be  assumed 
that  the  variance  is  some  simple  function  of  rj,  the  expected  value  of  y. 
Often  it  is  found  satisfactory  to  assume  the  variance  proportional  either 
to  the  expected  value  or  to  the  square  of  the  expected  value,  but  other 
assumptions  may  be  made  in  special  cases. 

As  mentioned  in  Chapter  2,  a  weighted  regression  in  which  the  weights 
are  not  given  but  are  estimated  from  the  data  generally  requires  iterative 
calculations.  Usually  it  is  postulated  that  the  variance  of  the  observed 
values  of  y  is  some  given  function  of  the  expected  value.     Sometimes  the 
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estimates  of  variance  provide  further  information  about  the  regression 
coefficients,  which  should  be  taken  into  account  in  any  exact  analysis. 
The  method  of  maximum  likelihood1 — by  which  the  values  chosen  as 
estimates  of  the  parameters  are  those  that  maximize  the  probability 
density  of  the  observed  sample,  regarded  as  a  function  of  the  parameters — 
has  certain  optimum  properties  and  in  particular  enables  this  additional 
information  to  be  taken  into  account.  The  method  of  least  squares, 
which  has  hitherto  been  employed,  is  appropriate  when  the  variance  is 
independent  of  the  expected  value,  so  that  its  estimation  does  not  provide 
additional  information;  it  may  also  be  used  as  an  approximate  method 
even  when  the  variance  is  a  function  of  the  expected  value. 
We  consider  a  simple  linear  regression 

n  =  A)  +  f$xx 

in  which  the  variance  of  y  is  a  given  function  of  r\  multiplied  by  a  scale 
factor,  say 

V(y)  =  {o<j>{rj)Y- 

For  the  logarithm  of  the  likelihood  we  have 


L  = 

=  const 

—  wlog 

a  -  S{\q 

g<£)-  is 

(y- 

-  v)2la 

Now  we 

may 

write 

so  that 

d  log</> 

_  1  d<j>  _ 

cf>  drj 

i  dc/> 

1  dcf> 
-  \px. 

% 

1  The  maximum-likelihood  method  (see,  for  example,  Rao  (1952))  is  generally 
formulated  as  follows :  If  L  is  the  logarithm  of  the  probability  density  of  the  observed 
sample,  depending  on  parameters  6U  d2,  •  ■  •,  etc.,  equations  equal  in  number  to  the 
unknown  parameters  are  derived  by  equating  to  zero  dL/dd^  dL/dd2,  •  •  •.  The  estimates 
of  the  parameters  derived  from  these  equations  are  consistent,  that  is,  converging  in 
probability  to  the  parameter  values,  and  also  efficient,  that  is,  asymptotically  of  minimum 
variance.    The  negative  of  the  matrix  of  expected  values  of  second  derivatives, 


/        d2L    \ 

~[E~ddJdi) 


gives  the  inverse  of  the  variance-covariance  matrix  of  the  estimates.  The  method 
reduces  to  the  method  of  least  squares  when  the  variation  is  normal,  and  exact  signifi- 
cance tests  based  on  the  normal  distribution  can  be  made  when  the  estimating  equations 
are  linear. 


Then 
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Fir 

L1  =  —-  =  Sy)x[(y  -  rjfhW 
dp1 


1]  +  S(y  -  tj)l<*p9 
1]  +  Sx{y  -  n)la^\ 


La=d±  =  —  +-S{y-  n?m*. 
da         a        a 

For  the  second  derivatives,  only  the  negatives  of  the  expected  values  are 
required;   these  expected  values  are  set  out  in  the  following  matrix: 


Sx 


1 

2 
a 


Sx 

Sx2 


S(ipx) 


a 

a 

In 
~g~2 


(4.1) 


(4.2) 


We  write  w  for  the  estimated  value  of  l/^2;  then  the  first  two  equations 
of  estimation  are 

Sy>[w(y  -  Yf  -  a2]  +  Sw(y  -  Y)  =  0 
Sipx[w(y  -  Y)2  -  a2]  +  Swx(y  -  Y)  =  0, 
which  may  be  written 

b0Sw  +  b±Swx  =  Swy  +  Sip[w(y  —  Y)2  —  o2] 
b0Swx  +  b±Swx2  =  Swxy  +  Sipx[w(y  —  Y)2  —  o2]. 
For  comparison,  the  least-squares  equations  would  be 
b0Sw  +  bijSwx  =  Swy 
b0Swx  +  b±Swx2  =  Swxy, 

giving  slightly  different  values,  not  only  for  b0  and  bx  but  also  for  the 
weights  w.  The  second  terms  on  the  right-hand  side  of  (4.2)  are  adjust- 
ments to  enable  the  information  provided  by  the  variance  estimates 
(y  —  b0  —  b±x)2  to  be  included. 

In  the  information  matrix  (4.1)  the  terms  in  ip2  are  seen  to  represent  the 
information  on  the  parameters  provided  by  the  variance  estimates.  Now 
we  have 


^  +  ^  =  ^1^ 


d(a<l>y 

.  dr)  . 
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so  that  the  contribution  of  the  variance  estimates  to  the  information  is 
proportional  to 


dr\ 


(4.3) 


In  general,  this  quantity  is  small,  and  so  the  information  may  be  ignored. 
For  example,  if 

</>  =  ?], 

then  a  is  usually  a  small  fraction;  the  quantity  (4.3)  is  then  2<x2<^  1. 
For  this  reason,  and  also  for  computational  simplicity,  the  least-squares 
equations  rather  than  the  maximum-likelihood  equations  are  generally 
used.  Even  for  these,  some  iteration  is  required,  since  the  weights  must 
be  determined  approximately  from  preliminary  estimates  of  the  expected 
values.  Only  two  cycles  of  iteration  are  usually  required,  however,  since 
great  accuracy  in  the  weights  is  not  necessary  for  giving  accurate  estimates 
of  means  and  regression  coefficients. 

4.6    ESTIMATION  BY  THE  METHOD  OF  MAXIMUM 
LIKELIHOOD  IN  GENERAL 

As  mentioned  in  Section  4.5,  statistical  parameters  may  be  efficiently 
estimated  in  general  by  the  method  of  maximum  likelihood,  which  provides 
not  only  the  required  estimates  but  also  their  estimated  variances  and 
covariances.  The  method  is  in  most  cases  difficult  to  apply  directly,  but 
the  equations  may  usually  be  solved  expeditiously  by  iterative  means. 

Although  a  detailed  discussion  of  the  method  of  maximum  likelihood 
is  beyond  the  scope  of  this  book,  it  is  of  interest  to  see  how  it  links  up 
with  regression  methods  in  general.  If  L  is  the  logarithm  of  the  likelihood 
of  a  sample  of  independent  observations,  we  have 

L  =  S(l) 

where  the  /  are  the  log  likelihoods  of  the  members  of  the  sample.    If  L  is 

a  function  of  parameters  6lt  d2,  •  •  •,  it  is  differentiated  with  respect  to 

each  of  these  parameters,  and  the  derivatives  are  equated  to  zero,  to  give 

equations  for  estimates  of  these  parameters.     The  equations  may  be 

written 

dL  dl 

"=S—-^0  (/=  1,2,  •••) 

or  S{lt)  =  0. 

If  trial  values  of  these  estimates  are  chosen,  improved  values  may  be 
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determined  by  successive  approximation.  The  equations  for  the  adjust- 
ments ei  are 

e^itf)  +  e2S(lM  +  •  •  •  =  SQd 

or  generally 

ZehS(lhlt)  =  S(lt). 

h 

The  equations  show  that,  formally,  the  ei  may  be  regarded  as  regression 
coefficients  for  the  regression  of  1  on  the  lt.  In  exactly  the  same  way  as  in 
ordinary  regression  analysis,  the  variances  and  covariances  are  given  by 
the  elements  of  the  inverse  of  the  matrix  of  the  S(lhli). 

This  presentation  is  actually  a  slight  modification  of  the  method  as 
usually  presented ;  usually  the  expected  values  of  the  sums  of  squares  and 
products  SiljJi)  are  given  rather  than  the  sample  values.  There  is 
theoretically  little  to  choose  between  the  two  methods,  although  use  of 
expected  values  may  reduce  the  calculations  a  little.  Sample  values  have 
been  used  here  to  show  the  relationship  to  ordinary  regression  analysis. 


CHAPTER    5 


Choice  among 
Regression  Formulas 


5.1     GENERAL 

In  previous  chapters  we  have  considered  the  fitting  of  regression 
equations,  simple  or  multiple,  based  on  a  given  set  of  independent  variables 
and  assuming  a  given  form  of  relationship.  However,  it  very  often 
happens  that  the  experimenter  is  faced  with  a  choice  of  two  or  more 
possible  regression  functions.  We  shall  consider  in  this  chapter  how 
different  independent  variables  and  forms  of  regression  functions  may  be 
compared. 

There  are  several  variants  of  the  problem.  First  of  all,  there  may  be 
two  or  more  independent  variables  (or  sets  of  them),  on  which  the 
regression  of  the  dependent  variable  may  be  calculated.  To  test  the 
significance  of  differences  between  different  independent  variables,  some 
work  of  Hotelling  (1940)  is  useful.  A  second  type  of  problem  is  one  in 
which  a  least-squares  regression  on  a  set  of  variables  is  to  be  compared 
with  a  theoretical  formula. based  on  the  same  or  different  variables.  In 
the  third  type  two  or  more  theoretical  formulas  are  compared;  Hoel 
(1947)  and  Williams  and  Kloot  (1953)  have  given  some  results  for  this 
type  of  problem. 

Since  each  variant  introduces  different  logical  problems,  each  will  be 
considered  separately.  It  should  be  borne  in  mind  that,  in  comparing 
different  regression  equations  or  theoretical  formulas,  the  exact  logical 
basis  of  the  test  should  be  made  quite  explicit ;  otherwise  there  is  the  risk 
of  making  erroneous  deductions. 

We  have  already  considered  one  problem  of  choice  among  regression 
formulas  in  Chapter  3,  in  providing  tests  of  significance  in  multiple 
regression.  For  in  testing  the  significance  of  the  contribution  of  any 
variable  to  the  regression,  we  are  enabling  a  decision  to  be  made  whether 
or  not  that  variable  is  worth  including  in  the  equation.  Of  course,  as 
previously  emphasized,  the  decision  may  be  made  on  grounds  other  than 
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that  of  a  significance  test;  but  a  test  provides  a  basis  for  a  decision  when 
no  other  basis  is  given.  As  the  test  of  significance  of  a  regression  coeffi- 
cient is  a  straightforward  application  of  the  usual  procedures  of  the 
analysis  of  variance,  it  will  not  be  considered  further  here. 

5.2    COMPARISON  OF  TWO  OR  MORE 
INDEPENDENT  VARIABLES 

When  two  or  more  independent  variables  are  measured,  it  will  generally 
be  appropriate  to  calculate  a  regression  equation  including  such  variables 
as  contribute  significantly  to  the  relationship.  However,  occasions  arise 
when  the  choice  of  one  only  is  to  be  made,  either  for  reasons  of  economy 
in  subsequent  investigations,  or  because  the  additional  contribution  of 
more  than  one  variable  is  not  expected  to  be  important.  In  such  cases  it 
is  desirable  to  have  a  test  whether  the  sums  of  squares  for  regression  on 
different  independent  variables  are  significantly  different. 

The  test  developed  by  Hotelling  depends  on  the  fact  that  the  sum  of 
squares  for  regression  of  y  on  xi  is  the  square  of  a  linear  function  of  y. 
His  test  is  of  the  null  hypothesis  that  these  linear  functions  all  have  the 
same  expectation. 

We  may  write 

2.  =  Sy(x{  -  x^jy/SiXi  -  xt)2 

=  pJVUi 

for  the  square  root  of  the  sum  of  squares  for  the  regression  of  y  on  xt. 
Then  the  variances  and  covariances  of  the  zi  in  terms  of  the  residual 
variance  o2  are  as  follows: 

V(Zi)  =  o-2, 
Cov  (zh,  zt)  =  rhia2, 

where  rhi  is  the  sample  correlation  coefficient  of  xh  and  x{. 

Now  let  the  elements  of  the  inverse  of  the  correlation  matrix  (rhi)  be 
denoted  by  rni.    Then  the  mean  of  the  zi  is  given  by 

z  =  ?2rh%IIIrhi 

h   i  h    i 

and  the  sum  of  squares  of  deviations  of  the  zt  from  their  mean  is  therefore 

h   i 
hi  hi 
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This  sum  of  squares  provides  a  criterion  for  the  reality  of  differences 
among  the  zi9  and  consequently  for  the  differences  among  the  x{  as 
predictors  for  y.  It  may  be  tested  against  the  residual  mean  square  from 
the  multiple  regression  of  y  on  the  x{.  The  test  criterion  thus  has  an  F 
distribution  with  p  —  1  and  n  —  p  —  1  degrees  of  freedom. 

For  demonstrating  algebraically  the  method  of  arriving  at  the  signifi- 
cance test,  this  derivation  is  satisfactory,  but  for  calculation  purposes 
some  modifications  may  be  introduced  to  simplify  the  work.  Rather 
than  calculating  the  inverse  of  the  matrix  (rw),  it  is  preferable  to  invert  the 
matrix  (thi)  of  sums  of  squares  and  products  of  the  xim  This  has  the 
advantage  of  avoiding  the  calculation  of  the  correlation  coefficients  and 
the  square  roots  involved  therein ;  also,  the  inverse  of  (thi)  is  required  in 
calculating  the  multiple  regression  equation  and  the  sum  of  squares  for 
regression,  and  so  has  to  be  calculated  in  any  case.  Then,  if  the  elements 
of  the  inverse  of  (thi)  are  designated  th\  we  have 

ru  =  *V(W«). 
Hence 

21  rh\z(  =  12  tMPnPi  =  I  biPi, 

hi  hi  i 

which  is  recognizable  as  the  sum  of  squares  for  regression,  with  p  degrees 
of  freedom. 
The  quantity  z  is  a  little  more  troublesome.    We  have 

12  rh%  =  2V'm2  tMPi  =  2Vhn  K 

hi  hi  h 

which  is  simply  a  weighted  sum  of  the  regression  coefficients. 

22^  =  2Vhu2t"Wtu 

hi  h  i 

and  so  is  the  most  troublesome  factor  to  calculate. 

From  this  derivation  it  is  seen  that  the  test  of  significance  proposed  by 
Hotelling  is  equivalent  to  a  test  of  the  adequacy  of  a  single  compound 
variate  ^^\/hhtMxi  as  a  regression  function  to  replace  the  multiple 

h   i 

regression  function.    The  regression  sum  of  squares  corresponding  to  this 
variate  is  z222  rM  >  tne  variate  is  so  constructed  that  it  will  agree  with  the 

h  i 

multiple  regression  function  provided  each  of  the  variables  xt  is  equally 
highly  correlated  with  y. 

5.3    EXAMPLE 

Example  5.1  The  Comparison  of  Two  Independent  Variables  for 
Estimating  Maximum  Compressive  Strength.  The  most  useful  appli- 
cations of  this  test  are  in  the  comparison  of  two  independent  variables.    The 
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following  example  is  taken  from  some  tests  carried  out  to  determine  the  relation 
between  maximum  compressive  strength  parallel  to  the  grain  (y)  and  density  (x^ 
for  radiata  pine.  Since  some  specimens  of  radiata  pine  contain  a  large  amount 
of  resin  which  contributes  to  the  density  but  contributes  little  to  the  strength,  the 
resin  content  of  the  specimens  was  determined  and  an  adjusted  density  figure 
(#2)  calculated.  The  question  arose  whether  the  correlation  of  y  with  x2  was 
significantly  higher  than  that  with  xv  The  data  for  this  experiment  are  shown 
in  Table  5.1. 

TABLE  5.1 

Values  of  Maximum  Compressive  Strength  (y),  Density  (xj, 

and  Adjusted  Density  (x2)  for  42  Specimens  of 

Pinus  Radiata 


Maximum 

Maximum 

Compressive 

Adjusted 

Compressive 

Adjusted 

Strength,  y, 

Density,  xu 

Density,  x2, 

Strength,  y, 

Density,  xx, 

Density,  x2, 

lb./sq.  in. 

Ib./cu.  ft. 

lb./cu.  ft. 

lb./sq.  in. 

lb./cu.  ft. 

lb./cu.  ft. 

3040 

29.2 

25.4 

3840 

30.7 

30.7 

2470 

24.7 

22.2 

3800 

32.7 

32.6 

3610 

32.3 

32.2 

4600 

32.6 

32.5 

3480 

31.3 

31.0 

1900 

22.1 

20.8 

3810 

31.5 

30.9 

2530 

25.3 

23.1 

2330 

24.5 

23.9 

2920 

30.8 

29.8 

1800 

19.9 

19.2 

4990 

38.9 

38.1 

3110 

27.3 

27.2 

1670 

22.1 

21.3 

3160 

27.1 

26.3 

3310 

29.2 

28.5 

2310 

24.0 

23.9 

3450 

30.1 

29.2 

4360 

33.8 

33.2 

3600 

31.4 

31.4 

1880 

21.5 

21.0 

2850 

26.7 

25.9 

3670 

32.2 

29.0 

1590 

22.1 

21.4 

1740 

22.5 

22.0 

3770 

30.3 

29.8 

2250 

27.5 

23.8 

3850 

32.0 

30.6 

2650 

25.6 

25.3 

2480 

23.2 

22.6 

4970 

34.5 

34.2 

3570 

30.3 

30.3 

2620 

26.2 

25.7 

2620 

29.9 

23.8 

2900 

26.7 

26.4 

1890 

20.8 

18.4 

1670 

21.1 

20.0 

3030 

33.2 

29.4 

2540 

24.1 

23.9 

3030 

28.2 

28.2 

The  analysis  shows  that  (as  was  to  be  expected)  the  adjusted  density  gives  the 
higher  sum  of  squares  for  regression.  Tables  5.2  and  5.3  and  the  accompanying 
calculations  show  the  derivation  of  the  quantity  z.  The  analysis  of  variance 
(Table  5.4)  shows  the  sum  of  squares  for  regression  and  its  partition  into  two 
parts,  the  second  providing  the  test  for  difference  of  correlations.  It  is  seen 
that  the  sum  of  squares  for  difference  of  correlations  is  not  significant,  the  value 
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of  F  being  3.20.  However,  this  is  a  test  of  departure  from  the  null  hypothesis 
in  either  direction.  If,  as  seems  reasonable,  we  are  interested  only  in  departures 
giving  adjusted  density  the  greater  correlation,  a  one-tailed  t  test  on  the  difference 
would  be  appropriate.  This  is  equivalent  to  doubling  the  probability  deemed 
significant  on  the  Ftest.  As  the  F  value  of  3.20,  corresponding  to  a  t  value  of 
1.79,  is  significant  at  the  10  per  cent  point,  the  difference  may  be  taken  as 
significant  at  the  5  per  cent  point. 


TABLE  5.2 
Sums  of  Squares  and  Products  of  Values  in  Table  5.1 


xl 

Sums  of  Squares  for 
x2                  y               Simple  Regression 

[828.24 
[820.79 

820.79]         152,854                28,209,600 
885.58J         162,304                29,746,100 

TABLE  5.3 

The  Inverse 

Matrix  and  the  Partial  Regression 
Coefficients 

h. 

10-6  x  1 

14,814.84         -13,730.97" 
-13,730.97            13,855.60_ 

2Vthhbh=  5497.021 

35.916 
149.986 

22V(thhtu)thi  =  1.021  249 

h  i 

Sum  of  squares  for  regression  on  compound  variable 

=  5497.0212/1.021  249 
=  29,588,500 


TABLE  5.4 

Analysis  of  Variance 

for  Comparing  Correlations 

D.F. 

Sum  of  Squares 

Mean  Square 

Regression  on  compound 
Difference  of  correlations 

variable 

1 

1 

2 
39 

29,588,500 
244,700 

244,700 

Regression  on  xx  and  x2 
Residual 

29,833,200 
2,979,200 

76,390 

Total 


41 


32,812,400 
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5.4    A  SPECIAL  CASE:    TEST  OF  SIGNIFICANCE  OF 
REGRESSION  THROUGH  ORIGIN 

There  are  numerous  special  applications  and  adaptations  of  Hotelling's 
test.  One  of  the  more  important  is  in  testing  the  significance  of  a 
regression  through  the  origin.     If  a  regression  of  the  form 


'ixi 


(5.1) 


Y 

= 

UqXq, 

x0 

= 

1, 

K 

= 

y- 

is  fitted  to  a  set  of  data,  the  fact  that  the  line  passes  through  the  origin 
results  in  b-^s  having  a  small  standard  error.  However,  the  standard 
error  of  b±  does  not  provide  the  test  of  significance  that  is  generally 
required.  What  is  required  is  not  a  test  whether  bx  differs  significantly 
from  zero  but  a  test  whether  the  relationship  (5.1)  accounts  for  the 
variation  in  y  significantly  better  than  the  assumption  that  Y  is  constant ; 
that  is,  the  alternative  regression 

where 
so  that 

Hotelling's  test  is  directly  applicable  here,  since  there  is  a  choice  of  two 
alternative  independent  variables,  x0  and  xx.  The  result  is  an  F  test  with 
1  and  n  —  2  degrees  of  freedom,  which  may  be  shown  to  reduce  to  the  form 

r...K-^y. 

where  s2  is  the  residual  mean  square  of  y,  with  n  —  2  degrees  of  freedom. 

Example    5.2     Significance   of   Regression    through    Origin.     Good 

examples  of  a  test  of  significance  of  a  regression  through  the  origin  are  not  easy 
to  find;  for  if  such  a  regression  is  indicated,  it  is  in  most  cases  highly  significant, 
and  no  test  is  necessary.  The  significance  of  a  regression  through  the  origin 
would  be  required  with  data  consisting  of  a  cluster  of  points  at  some  distance 
from  the  origin,  as  in  the  following  hypothetical  set  of  data  (Table  5.5). 

For  these  data  we  find  the  regression  coefficient  for  the  line  through  the  origin 
to  be 

b1  =  11,243/955 

=  11.77 
with  standard  error 

133,772  -  ll,2432/955\ 
5   x  955  ) 

=  0.54. 


y<! 
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The  ratio  of  b±  to  its  standard  error  is  highly  significant.  However,  to  test  the 
significance  of  bx  in  the  manner  described,  we  determine  the  residual  mean 
square  (with  four  degrees  of  freedom) 

i(5876  -  2932/17.5)  =  242.6 
and  then  calculate 

[11,243  -  876V(955/6)]2 
2[955  -  75  V(955/6)]  242.6 
=  8.58, 

which  is  significant  at  the  5  per  cent  level. 

The  result  of  this  significance  test  shows  that  it  is  clearly  more  appropriate 
than  the  simple  test  of  the  departure  of  bx  from  zero. 


TABLE  5.5 

Hypothetical  Data  Illustrating  Test  for  Significance 

of  Regression  through  Origin 

Sums  of  Squares  and  Products 
Total  Uncorrected     Corrected 

10      11       12      13       14       15         75  955  17.5 

109     134     123     134     179     197       876  133,772  5876 

Sxxy      11,243  293 


5.5    GENERALIZATION  OF  HOTELLING'S  TEST 

The  test  of  significance  described  in  previous  paragraphs  can  readily  be 
generalized  to  multiple  regressions  in  which  a  set  of  independent  variables 
has  been  decided  on  and  the  choice  of  one  of  a  number  of  additional 
variables  is  to  be  made.  The  test  is  then  based  on  the  comparison  of  the 
sums  of  squares  for  partial  regression  on  each  of  the  new  variables,  and  no 
new  principle  is  involved.  If  there  are  q  variables  in  the  equation,  and  one 
more  out  of  p  new  variables  is  to  be  decided  on,  the  significance  will  be 
tested  by  means  of  an  F  with  p  —  1  and  n  —  p  —  q  —  1  degrees  of  freedom. 

Since  the  test  is  based  on  the  comparison  of  the  square  roots  of  regression 
sums  of  squares,  it  will  be  seen  that  it  cannot  be  generalized  for  the 
comparing  of  sets  of  two  or  more  variables.  The  square  root  of  a  sum  of 
squares  for  one  degree  of  freedom  is  a  linear  function  of  y;  the  com- 
parison of  such  linear  functions  is  amenable  to  analysis  of  variance 
techniques.  A  sum  of  squares  with  two  or  more  degrees  of  freedom,  on 
the  other  hand,  cannot  be  treated  in  this  way,  so  that  no  generalization  is 
possible. 
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5.6    COMMENTS  ON  THE  TEST 

The  limitations  of  this  test  should  be  noted.  In  the  first  place,  it  is 
strictly  a  conditional  test.  It  is  valid  for  comparing  the  efficiency,  as 
predictors,  of  the  given  sets  of  values  of  the  independent  variables,  without 
reference  to  any  population  from  which  they  might  have  been  drawn.  It 
cannot,  however,  validly  be  extended  to  drawing  conclusions  about 
future  observed  values  of  the  independent  variables. 

In  a  few  applications  it  may  be  reasonable  to  assume  that  the  independent 
variables  are  fixed  and  that  the  conditional  test  is  all  that  is  required. 
In  other  applications  it  will  be  found  that  the  test,  although  not  exact,  is  a 
good  approximation.  In  general,  however,  if  a  test  of  the  efficiency  of 
predictors  for  future  use  is  required,  some  allowance  will  have  to  be  made 
for  the  variation  of  the  values  from  those  observed.  This  question  has 
not  received  much  attention.  An  approximate  test  has  been  derived  for 
comparing  two  predictors,  when  the  variables  are  assumed  drawn  from  a 
normal  population.  The  test  criterion,  distributed  approximately  as  F 
with  1  and  n  —  3  degrees  of  freedom,  is 

Oi  -  z2)2 

2*1       r  )\  (Zl  +  22)2(1  ~  '12>3 
25(1       ™  +  4(„  -  1)(1  +  ,12) 

where  s2  is  the  residual  mean  square  from  the  regression.     This  may  be 
compared  with  the  criterion  given  by  Hotelling's  test,  namely 

Oi  -  z2)2 
2s2(l  -  r12) ' 

The  additional  term  in  the  denominator  of  the  first  expression  makes 
allowance  for  the  variation  in  the  x{. 

Thus,  in  Example  5.1,  if  the  wider  hypothesis  is  being  tested,  we  have 

%  =  ^28,209,600  =  5311.3 
z2  =  ^29,746,100  =  5454.0 
r12  =  0.95839 
and  the  test  criterion  becomes 

(5311.3  -  5454.0)2 


„   „,„™   ™„^   10,765.32  x  0.041613 

2  x  76,390  x  0.04161  +  — ^77^- 

4  x  41  x  1.95839 

142.72 


6357  +  26 
3.19. 
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The  difference  between  this  result  and  that  given  previously  is  negligible, 
probably  because  of  the  large  value  of  r12  for  this  example. 

If  the  ratio  of  the  variances  is  known,  we  may  make  an  exact  test  to 
compare  two  predictors.  The  test,  when  the  variances  are  equal,  is 
equivalent  to  testing  the  difference  of  the  partial  regression  coefficients. 
Thus 

F=         (£i-62)2 


(,ii  _  2t12  +  t22)s2 


with  1  and  n  —  3  degrees  of  freedom.  Since  both  the  predictors  in 
Example  5.1  are  densities,  it  may  be  reasonable  to  assume  that  the 
variances  are  equal.     Applying  the  test  based  on  that  assumption,  we  find 

(35.916-  149.986)2  x  106 
56,132.38  x  76,390 

=  3.03. 

Here,  too,  the  result  differs  little  from  that  of  the  original  test  (F  =  3.20). 
It  should  be  remarked  that,  for  a  strictly  valid  application  of  Hotelling's 
test,  the  sign  of  the  regression  coefficient  should  be  taken  into  account. 
That  is  to  say,  the  test  is  not  one  of  the  absolute  value  of  the  correlation 
but  of  its  actual  value.  Fortunately,  when  the  test  is  to  be  applied,  there 
is  usually  prior  knowledge  of  whether  each  regression  coefficient  is 
positive  or  negative,  and  the  null  hypothesis  can  be  framed  accordingly. 
Thus,  in  Example  5.1  comparing  density  and  adjusted  density,  the  two 
regression  coefficients  are  known  to  be  positive.  If  the  regressions  of  y 
on,  say,  x  and  x~x  were  being  compared,  the  two  coefficients  would  be 
expected  to  be  opposite  in  sign,  and  the  null  hypothesis  would  be  framed 
in  terms  of  the  regressions  on  x  and  —  xr1.  Occasionally,  however,  there 
will  be  no  prior  reason  to  assume  that  the  coefficients  are  either  positive  or 
negative ;  to  take  them  all  as  positive  will  then  bias  the  test,  making  the 
significance  conservative.  It  would  be  valid  to  allot  signs  at  random; 
however,  the  significance  attained  would  not  then  necessarily  be  attri- 
butable to  difference  in  correlation,  but  might  be  partly  attributable  to 
difference  in  sign.  In  other  words,  the  null  hypothesis  we  should  like  to 
test  would  be  that  the  absolute  values  of  the  correlations  are  equal;  all 
that  Hotelling's  test  can  do  is  to  test  the  null  hypothesis  that  their  actual 
values  are  equal.  In  most  practical  cases,  of  course,  this  limitation  is  not 
a  serious  one,  for  the  null  hypothesis  can  be  framed  in  the  light  of  the 
expected  direction  of  the  regression. 
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5.7    COMPARISON  OF  A  THEORETICAL  AND  A 
LEAST-SQUARES  FORMULA 

The  comparison  of  a  theoretical  and  a  least-squares  formula  is  likely 
to  be  of  practical  interest  only  when  the  same  independent  variables  are 
present  in  each,  and  only  then  can  a  satisfactory  significance  test  be  made. 
This  problem  has  already  been  dealt  with  in  Chapter  3  on  multiple 
regression.     If  the  hypothetical  formula  is 

n  =  p0  +  Pi*i  +  •••'  +  P#p 

and  the  least-squares  equation,  based  on  the  same  variables  x{,  is 

Y  =  b0  +  bxxx  + h  bpxv, 

the  sum  of  squares  of  deviations  of  y  from  r\  can  be  split  into  two  parts  as 
follows : 

S(y  -  rjf  =  S(y  -  Yf  +  S(Y  -  rjf 

the  degrees  of  freedom  being 

n  =  (n-p-  l)  +  (p+  1). 

Hence,  to  test  the  improvement  of  Y  over  r\  (or,  what  is  the  same  thing, 
the  adequacy  of  r\  to  represent  the  relationship),  we  have  an  F  with/?  +  1 
and  n  —  p  —  1  degrees  of  freedom : 

(n-p-l)S(Y-r]f 
(p  +  l)S(y  -  Yf      ' 

The  totality  of  all  sets  of  values  ft  for  which  F  calculated  in  this  way  is  not 
significant  provides  the  simultaneous  fiducial  range  for  the  ft. 

5.8    COMPARISON  OF  TWO  THEORETICAL  FORMULAS 

In  comparing  theoretical  formulas  it  is  easy  to  determine  which  of  a 
number  of  formulas  gives  the  best  prediction  in  the  least-square  sense, 
but  it  is  not  easy  to  establish  valid  significance  tests,  since  in  fact  each 
formula  will  correspond  to  a  different  null  hypothesis.  It  will  not  always 
be  possible  to  frame  a  null  hypothesis  which  will  enable  a  test  between 
different  formulas  to  be  set  up.  It  is  interesting  to  note  that  Hoel  (1947) 
and  Williams  and  Kloot  (1953)  have  both  examined  the  comparison  of 
two  theoretical  formulas,  but  they  arrive  at  different  results  because  they 
are  testing  different  hypotheses. 

Hoel  considers  the  comparison  of  the  original  formula  for  the  expected 
value  of  y  with  an  alternative ;  we  shall  designate  these^  and/2  respectively. 
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The  test  he  devises  is  whether  the  data  are  concordant  with  the  assumption 
of  fa  or  whether fa  should  be  rejected  in  favor  of/2.  As  he  says,  the  test 
"should  not  be  interpreted  as  a  device  for  selecting  one  of  two  alternative 
formulas";  the  null  hypothesis  is  in  fact  that/^  is  appropriate.  Hoel 
arrives  at  the  intuitively  reasonable  criterion,  namely  the  regression  of 
y  —  fa  on/2  —  fv  If  this  is  significantly  positive,  then  fa  is  to  be  rejected  in 
favor  of/2. 

Williams  and  Kloofs  test  is  one  of  the  null  hypothesis  that  the  two 
theoretical  formulas  are  equal  in  ability  to  predict  y.  As  might  be 
expected,  the  test  is  symmetrical  with  respect  to/x  and/2,  reducing  to  that 
of  the  regression  (through  the  origin)  of  y  —  \{fa  +  f2)  on  fa  —  fa.  A 
significant  positive  regression  indicates  that  /2  predicts  better  than  fx, 
a  significant  negative  regression  thaty^  predicts  better  than/2.  A  non- 
significant regression  leaves  undecided  the  choice  between /[  and/2. 

These  two  alternative  possible  tests  are  applicable  to  different  hypotheses, 
but  the  fact  that  such  alternatives  exist  shows  that  the  null  hypothesis 
must  be  clearly  defined  before  a  test  is  made.  For  comparing  more  than 
two  theoretical  formulas,  the  situation  is  even  more  complicated. 

Example  5.3  Comparison  of  Two  Formulas  for  Estimating  the 
Density  of  a  Timber  Specimen  from  That  of  Neighboring  Specimens. 

The  data  used  here  are  from  the  experiment  discussed  in  Section  3.16,  Example 
3.5.  Sets  of  five  end-matched  specimens  are  taken  from  a  number  of  pieces  of 
timber  and  the  density  determined  on  each.  For  purposes  of  subsequent 
experiments  the  density  of  the  second  and  fourth  specimens  in  each  set  must  be 
estimated  from  that  of  the  other  three  specimens.  We  denote  the  values  for 
the  alternate  specimens  by  xlt  x2,  and  x3,  and  that  for  the  specimen  between  xx 
and  x2  by  y.    Two  alternative  formulas  are  suggested  for  estimating  y : 

ji  =  iJV^l    i   x<v 
or  /2  =  \{xx  +  x2  +  x3). 

The  expression  /2  has  the  advantage  of  being  based  on  more  specimens  and 
hence  will  be  less  affected  by  random  variation;   on  the  other  hand,  it  will  be 
biased  by  any  linear  trend  in  the  density  values  along  the  material. 
From  a  set  of  228  results,  the  following  results  were  obtained : 

S{y  -faf  =  110.7650 

S(y  ~/2)2  =  127.1689, 

showing  that/j  may  be  better  than/2. 
The  difference  of  these  two  residual  sums  of  squares  is 

S{2y  -fa  -fa)(fa  -/2)  =  16.4039, 

which  is  seen  to  be  twice  the  sum  of  products  of  y  —  \{fa  +  fa  and/i  —  fa 
The  sum  of  squares  of  fa  —  fa  is 

S(fi  -h?  =  32.50, 
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so  that  the  sum  of  squares  for  the  regression  of  y  —  JC/j  +  /2)  on  fx  —  /2  is 

16.40392/4  x  32.50  =  2.070. 

As  the  residual  mean  square  (obtained  from  the  complete  data)  is  0.4275,  we 
have 

F  =  2.070/0.4275 

=  4.84  (significant  at  the  5  per  cent  level). 

Hence  we  may  conclude  that/j  predicts  more  closely  than/2. 

Note  that  for  this  example  the  symmetrical  test  has  been  applied.  Had  the 
formula  fx  been  the  established  one,  and  the  question  of  replacing  it  by  f2  been 
raised,  Hoel's  test  would  have  been  appropriate. 

5.9    EXTENSION  TO  THE  COMPARISON  OF  MORE 
THAN  TWO  THEORETICAL  FORMULAS 

We  have  seen  that  there  are  at  least  two  different  possible  tests  of 
significance  for  the  comparison  of  two  theoretical  formulas,  depending 
on  the  emphasis  to  be  given  to  each  formula.  We  have  described  the 
symmetrical  test,  in  which  each  formula  is  on  an  equal  footing,  the 
comparison  being  directly  on  the  goodness  of  fit  of  each,  and  an  unsym- 
metrical  test,  wherein  one  formula  is  the  accepted  one  and  the  question 
of  its  replacement  by  an  alternative  is  considered.  In  the  same  way,  for 
the  comparison  of  three  or  more  theoretical  formulas,  there  are  numerous 
possible  tests;  any  test  should  be  framed  in  the  light  of  the  emphasis 
being  given  initially  to  each  formula.  We  shall  consider  here  only  a 
symmetrical  test,  into  which  each  formula  enters  equally,  none  being 
regarded  initially  as  established  or  accepted. 

As  Williams  and  Kloot  have  shown,  the  regression  test  just  outlined  for 
the  comparison  of  two  formulas  is  equivalent  to  a  test  of  the  significance 
of  the  difference  between  the  residual  sums  of  squares  left  by  each  formula. 
With  more  than  two  formulas  it  seems  appropriate  to  frame  the  test  as  a 
test  of  the  homogeneity  of  the  residual  sums  of  squares.  Such  a  test, 
for  correlated  variables,  has  been  developed  by  Wilks  (1946).  In  this 
particular  application  the  test  simplifies  greatly  because,  as  is  readily  seen, 
the  differences  among  the  residual  sums  of  squares  are  linear  in  the  y 
values. 

It  must  be  pointed  out  that  Wilks's  test,  or  any  other  test  of  homogeneity 
of  variances,  is  not  strictly  applicable  here,  since  the  different  sums  of 
squares  are  not  actually  variance  estimates.  If  one  formula  is  the  "true" 
one,  the  sum  of  squares  of  departures  from  this  formula  is  a  variance 
estimate,  but  all  the  other  sums  of  squares  contain  a  systematic  component 
resulting  from  differences  between  the  formula  applied  and  the  true  one. 
However,  the  formal  application  of  Wilks's  test  to  the  homogeneity  of 
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these  residual  sums  of  squares  gives  some  interesting  results  and  confirms 
a  significance  test  derived  in  an  intuitive  manner.  Wilks's  test  criterion, 
with  p  formulas  to  be  compared,  is 

I  = N =-  (5.2) 

v*(l  -ry~\\  +p-lr) 

where  vhi  =  S(y  -fh)(y  -fi)  =  u-ph-pi  +  tM  (say), 

PP  =  2  vti 

i 

i  i 

=  pu-  2]>> .  +  2  tu, 

i  % 

r  =       *** 


(p-Vlv» 

i 

Both  numerator  and  denominator  are  of  the  second  degree  in  y. 

The  numerator  is  found  to  be  proportional  to  the  sum  of  squares  of 
residuals  from  the  least-squares  combination 

/*  =  V/i  +  h%  +  ■■■  +  bp%, 

where  the  b-  are  subject  to  the  restriction 

i 

in  fact,  the  numerator 

kl  =  *%-/*)%|22^- 

h    % 

The  sum  of  squares  in  this  expression  for  the  numerator  has  n  —  p  +  1 
degrees  of  freedom. 
Now 

f(l  -  r)  =  --!—-  [(/>  -  1)2  v„  -  22  vM] 

P\P  —   1)  i  h*i 

'        [(P-Vltu-Ilh,] 


PiP  —  1)  i  hiti 

=  -^-7  se/7  -  dfiflp] 

P  —   1         i  i 

= x  the  sum  of  squares  between  the/'s,  within 

^  —         samples,  with  n(p  —  1)  degrees  of  freedom. 
This  term  is,  however,  independent  of  the  y. 
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Also 


P     i  h^i 

=  -S[I(y-fh)f 

P        h 

=  pS{y-ff 

where  /=  (fx  +fa  +  •  •'•  +  fv)lp. 

This  sum  of  squares  has  n  degrees  of  freedom. 
Thus,  apart  from  factors  independent  of?/,  the  criterion  (5.2)  is  the  ratio 

S(y-f*YIS(y-f)K 

The  numerator  measures  departure  from  the  compound  formula  /*, 
so  chosen  that  each  formula  contributes  to  it  according  to  its  fitness  as  an 
estimate  of  y;  in  the  denominator  the  departure  is  from  an  average 
formula,  to  which  each  original  formula  contributes  equally.  The  ratio 
appears  intuitively  to  provide  a  satisfactory  test  criterion.  It  may  be 
tested  by  means  of  an  F  test : 

„  -p  +  1  S(y-ff  -  S{y-f*f 
p-\  S{y-f*f 

with  p  —  1  and  n  —  p  +  1  degrees  of  freedom. 

Before  going  on  to  consider  the  best  methods  of  calculating  these  sums 
of  squares,  we  shall  look  at  the  other  factors,  not  involving  y,  in  Wilks's 
criterion.     These  are 


\mitM 

h   % 

~    1 
lp(p- 

■  i)         i                  hi 

P-l 

The  greater  the  variation  among  the/^  within  samples,  the  smaller,  and 
hence  the  more  highly  significant,  this  ratio  will  be.  It  is  a  measure  of  the 
consistency  of  the  different  formulas  among  themselves.  It  can,  however, 
be  ignored  for  present  purposes,  since,  first,  the  quantities  involved  are 
fixed  variables  and  so  do  not  generate  a  distribution,  and  second,  because 
it  does  not  add  to  the  information  already  provided  by  the  proposed  test 
criterion,  on  the  concordance  of  the  different  formulas  with  the  observed 
data. 

In  order  to  apply  the  test  we  need  to  calculate  both  the  sums  of  squares 
of  departures  and  the  regression  coefficients  (subject  to  the  restriction 
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mentioned).     These  are  best  determined  indirectly,  and  the  following 
steps  are  found  suitable. 

We  work  with  the  vhi,  the  sums  of  squares  and  products  of  the 
departures.  The  sum  of  squares  of  departure  from  /is  simply  the  arithmetic 
mean  of  all  the  vM : 

P    h   i 
The  sum  of  squares  of  departures  from/*  is 

22  W% 

h   i 

the  b-   being  subject  to  the  restriction  HZ?/  =  1,  and  so  chosen  that  the 
quadratic  form  is  a  minimum.    Then  it  is  found  that 

bi  =  2vhil2Ivhi 

h  hi 

where  the  vM  are  the  elements  of  the  inverse  of  the  matrix  (vhl). 
The  sum  of  squares  of  departures  from/*  is  then 

i/22  »*. 

h    i 

with  n  —  p  +  1  degrees  of  freedom. 

For  comparing  the  regression  coefficients  b/,  the  variances  and 
covariances  are  given  by  formulas  such  as 

h  hi 

=  a2(vu  -  b^Yl  vM) 

h    i 

and  Cov  (bx',  V)  =  a2(v12  -  X  vhl%  ^2/22  ***) 

A  A  A   i 

=  aV2  -  VV22 vM)- 

h   i 


Thus,  the  estimated  covariance  of  b{  and  b2'  is 

^12 


h  h  h   i  1 


(*-■/>  +  1)H^  («-/>  +  1) 


22  «* 


-  V* 


Example  5.4  Comparison  of  Three  Theoretical  Formulas  for 
Estimating  the  Ultimate  Failing  Strength  of  Eccentrically  Loaded 
Columns.  A  number  of  theoretical  formulas  have  been  derived  for  esti- 
mating the  failing  load  of  columns  from  knowledge  of  the  eccentricity  of  the 
load,  the  slenderness  ratio  of  the  column,  and  the  strength  properties  of  the 
material.    In  Table  5.6  are  set  out  a  series  of  results  given  by  direct  test  (y),  and 
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by  three  different  formulas  (fx,  f2,  and  f3),  for  specimens  of  timber  of  silver 
quandong  (Elaeocarpus  grandis  F.v.M.).  Table  5.7  gives  the  required  sums  of 
squares  and  products  (uncorrected),  and  Table  5.8  the  sums  of  squares  and 

TABLE  5.6 

Observed  and  Estimated  Failing  Loads  for  33  Columns  of  Silver 

Quandong,  Loaded  at  Nominal  Eccentricity  1/120 

y  h  h  fs 

Test  Estimated 


Pjfc 

PsJfc 

*ve'Jc 

Pjfc 

0.434 

0.443 

0.410 

0.467 

0.433 

0.431 

0.388 

0.461 

0.475 

0.454 

0.411 

0.483 

0.432 

0.414 

0.366 

0.449 

0.312 

0.308 

0.288 

0.314 

0.315 

0.310 

0.285 

0.321 

0.310 

0.307 

0.282 

0.318 

0.326 

0.321 

0.295 

0.334 

0.296 

0.297 

0.268 

0.312 

0.311 

0.301 

0.274 

0.315 

0.217 

0.224 

0.208 

0.228 

0.250 

0.247 

0.228 

0.254 

0.241 

0.232 

0.217 

0.235 

0.246 

0.244 

0.228 

0.248 

0.260 

0.249 

0.228 

0.256 

0.256 

0.245 

0.227 

0.250 

0.167 

0.166 

0.157 

0.167 

0.171 

0.170 

0.159 

0.171 

0.175 

0.172 

0.161 

0.173 

0.184 

0.178 

0.167 

0.179 

0.171 

0.169 

0.158 

0.169 

0.124 

0.124 

0.119 

0.123 

0.111 

0.113 

0.110 

0.112 

0.116 

0.116 

0.112 

0.115 

0.107 

0.110 

0.106 

0.109 

0.114 

0.116 

0.113 

0.115 

0.113 

0.119 

0.115 

0.118 

0.065 

0.068 

0.066 

0.066 

0.069 

0.069 

0.066 

0.067 

0.068 

0.068 

0.066 

0.067 

0.071 

0.072 

0.069 

0.070 

0.063 

0.065 

0.063 

0.064 

0.069 

0.071 

0.068 

0.069 
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TABLE  5.7 

Sums  of  Squares  and  Products  (Uncorrected)  of 

Results  in  Table  5.6 


A                    h  fz 

'1.929  219  1.771  904  2.008  474 

1.771904  1.628  134  1.843  793 

2.008  474  1.843  793  2.092  331 


1.958  492 
1.798  355 
2.039  330 
1.989  298 


TABLE  5.8 

Sums  of  Squares  and  Products  of  Departures  from 

Theoretical  Formulas 


10" 


h 

h 

/. 

"l,533 

4,355 

-50~ 

-6    x 

4,355 

20,722 

-4,594 

-50 

-4,594 

2,969_ 

Sum  of  elements  =  24,646  x 

io-6 

Sum  of  squares 

for  departure 

from/ 

=  24,646  x 

10-6/9 

=  2,738.4  x 

io-6 

products,  vM,  for  discrepancies  from  the  different  formulas.  These  figures 
indicate  that/x  and/3  are  superior  to/2  as  estimates  of  y,  but  that  there  may  be 
little  to  choose  between  fx  and  f3.  In  practice,  fx  would  be  used  unless  an 
alternative  were  chosen  for  reasons  of  consistency  or  convenience. 

In  Table  5.9  the  inverse  matrix  and  the  regression  coefficients  are  determined. 
Inspection  of  the  regression  coefficients  confirms  that  fx  is  the  most,  and  /2  the 
least,  satisfactory  formula.  Table  5.10  gives  the  adjusted  inverse  matrix  from 
which  the  variances  and  co variances  of  the  regression  coefficients  may  be  found. 

The  over-all  differences  between  the  formulas  are  seen,  from  the  analysis  of 


TABLE  5.9 
Inverse  Matrix  and  Regression  Coefficients 


Sum 

bi 

5317.74 

-1670.93 

-2495.91" 

1150.90 

1.083  16 

-1670.93 

598.49 

897.92 

-174.52 

-0.164  25 

-2495.91 

897.92 

1684.15 

86.16 

0.081  09 

Sum  of  squares  for  departure  from/* 

=  1/1062.54 
=  941.1  x  10" 


1062.54 


1.000  00 
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TABLE  5.10 
Adjusted  Inverse  Matrix  (vhi  —  6/2  vhk) 


4071.13         -1481.90         -2589.23 
-1481.90              569.83              912.07 
_ -2589.23              912.07            1677.16_ 

TABLE  5.11 

Analysis  of  Variance 

Sum  of  Squares 
D.F.                ( x  106) 

Mean  Square 
( x  106) 

Difference  of  regressions 
Departure  from/* 

2                     1797.3 
31                      941.1 

899** 
30.36 

Departure  from  / 

33                    2738.4 

Significant  at  1  per  cent  level 


variance,  Table  5.11,  to  be  significant  at  the  1  per  cent  level.    To  test  the 
difference  between  f2  and/"3,  for  example,  we  have 
b2'  -  b3'  =  -0.24534 
V(b2'  -  bz')  =  30.36(569.83  -  2  x  912.07  +  1677.16)  x  10-6 
=  0.012  838, 
so  that  F  =  ( -0.24534)2/0.012  838 

=  4.69,  significant  at  the  5  per  cent  level. 


CHAPTER    6 


Estimation  from 

the  Regression  Equation 


6.1    USE  OF  INFORMATION  FROM  REGRESSION 
RELATIONSHIPS 

The  ultimate  purpose  of  most  determinations  of  regression  equations  is 
to  use  them  to  derive  estimates ;  either  estimates  of  the  dependent  variable 
from  values  of  the  independent  variables  (direct  estimation),  of  one  of  the 
independent  variables,  given  values  of  all  the  others  (inverse  estimation), 
or  determination  of  the  regression  coefficients  themselves,  as  constants  of 
proportionality  measuring  the  observable  effect  of  one  variable  on  another. 
It  is  therefore  important  not  only  that  the  regression  equation  be  correctly 
determined  but  also  that  the  correct  method  of  estimation  be  applied. 
This  chapter  therefore  discusses  how  correct  estimates  should 'be  made, 
some  of  the  difficulties  arising  in  deriving  estimates,  and  some  applications. 

It  has  been  thought  worthwhile  to  lay  considerable  stress  on  the  need 
for  correct  methods  of  estimation,  because  there  appears  to  be  considerable 
confusion  on  the  subject,  even  in  the  statistical  literature.  It  is  believed 
that  there  should  be  no  difficulty  in  resolving  a  particular  problem  provided 
the  actual  setup  of  the  data  is  borne  in  mind  and  the  purpose  for  which  the 
results  are  to  be  used  is  clear.    These  points  will  be  amplified  later. 

One  point  on  which  there  is  sometimes  uncertainty  is  the  regression 
equation  to  be  employed,  that  is,  which  variable  is  to  be  treated  as  the 
dependent  variable.  It  must  first  be  ejnphasized  Jhat  the  dependent 
variable  has  to  be  subject  to  random  error  in  order  that  the  theory  on 
which  the  method  of  estimation  of  the  regression  equation  rests  may  be 
applicable;  a  variable  which  is  errorless,  or  which  has  been  subject  to 
selection  (even  though  also  subject  to  error)  cannot  be  chosen  as  dependent 
variable.  For  example,  in  the  calibration  of  electrical  moisture  meters 
timber  has  to  be  conditioned  to  equilibrium  moisture  content  under  a 
number  of  different  temperature  and  humidity  conditions  and  its  electrical 
resistance  determined.    Then,  since  moisture  content  has  been  selected 
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(even  though  subject  to  considerable  variation  at  any  one  condition),  it 
must  be  taken  as  the  independent  variable.  The  regression  of  resistance 
on  moisture  content  will  be  determined,  even  though,  in  using  the  equation, 
moisture  content  will  be  estimated  from  electrical  resistance. 

On  the  other  hand,  if  the  variable  to  be  estimated  from  the  equation  is 
subject  to  random  error  and  has  not  been  subject  to  selection,  it  should  be 
used  as  the  dependent  variable.  In  particular,  when  both  variables  are 
subject  to  error,  in  simple  regression,  either  regression  equation  may  be 
used,  depending  on  which  variable  is  to  be  predicted.  The  many  problems 
in  which  it  is  not  possible  to  do  this— especially  in  calibration  experiments 
and  the  like — give  rise  to  the  device  of  inverse  estimation  from  the 
regression  equation. 

6.2    THE  INTERPRETATION  OF  FIDUCIAL 
STATEMENTS  ABOUT  A  PARAMETER 

In  this  chapter  we  shall  be  deriving  fiducial  statements  about  the  values 
of  parameters.  The  interpretation  of  these  statements,  which  has  already 
been  discussed  in  Section  1.9,  is  fairly  clear  in  simple  problems.  There 
are,  however,  a  number  of  points  that  need  to  be  considered  carefully  in 
less  straightforward  applications. 

To  begin  with,  a  fiducial  statement  about  a  parameter  is,  broadly 
speaking,  a  statement  that  the  parameter  lies  in  a  certain  range  or  takes  a 
certain  set  of  values.  The  statement  is  either  true  or  false  in  any  particular 
instance,  but  it  is  made  according  to  a  rule  which  ensures  that  such 
statements,  when  applied  in  repeated  sampling,  have  a  given  probability 
(say,  0.95  or  0.99)  of  being  correct.  The  familiar  example  is  that  of 
setting  limits  to  the  mean  of  a  normal  population.  If  y  is  the  mean  and  s 
the  standard  deviation  of  a  sample  of  n  from  a  normal  population  with 
mean  rj,  the  t  distribution  enables  us  to  determine  a  value  of  t  such  that, 
with  probability  1  —  P,  y  lies  within  the  range  given  by  the  inequality 

rj  —  ts\\Jn  <y  <  rj  +  tsj^n.  (6.1) 

The  equivalent  statement  about  rj,  namely 

y  —  tsj^n  <tj  <y  +  tsfy/n, 

may  be  made  with  fiducial  probability  1  —  P. 

It  is  seen  from  this  example  that  any  statement  of  fiducial  probability 
must  be  based  on  a  direct  probability  statement  (such  as  6.1)  about 
statistics  derived  from  a  sample;  consequently,  the  two  statements  are 
equivalent,  being  but  different  ways  of  saying  the  same  thing.  This 
equivalence  is  clear  enough  in  simple  examples  such  as  the  one  just  given. 
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It  appears  necessary,  however,  to  consider  specifically  a  slightly  more 
complicated  case  in  which  confusion  has  arisen. 

The  case  is  that  of  the  estimation  of  a  ratio.  Suppose  that  xx  and  x2 
are  two  variables,  the  ratio  of  whose  means  is  assumed  to  be  y.  Then 
I  =  x±  —  yx2  is  a  variable  with  zero  mean ;  we  assume  that  |  is  normally 
distributed  with  unknown  variance. 

If  y  is  given,  the  mean  and  variance  of  |  for  any  sample  may  be  calcu- 
lated and  the  departure  of  the  mean  from  zero  tested.  In  terms  of  xx  and 
x2,  the  mean  of  a  sample  of  n  will  be 

x1      yx2, 

and  the  estimated  variance  of  the  mean  will  be 

(hi  ~  2ytl2  +  y%2)ln(n  -  1). 

Accordingly,  if  F  represents  the  tabular  value  of  the  F  distribution  with  1 
and  n  —  1  degrees  of  freedom  at  probability  level  P,  the  inequality 

(§x  -  yx2f  <  F(tn  -  2yt12  +  y%2)ln(n  -  1)  (6.2) 

is  satisfied  with  probability  I  —  P.  When  y  is  known,  this  result  follows 
from  ordinary  probability  theory;  it  is  still  true,  although  not  experi- 
mentally verifiable,  when  y  is  not  known.  If,  however,  the  sample  values 
are  given  but  not  y,  the  inequality  (6.2)  defines  a  range  of  values  for  y 
corresponding  to  a  fiducial  probability  1  —  P.  This  range  will  vary  from 
sample  to  sample,  both  in  position  and  extent;  it  differs  from  the  fiducial 
range  given  by  the  simpler  example  in  that,  for  certain  values  of  the 
sample  statistics,  it  will  be  unlimited,  admitting  all  real  values  of  y. 
Nevertheless,  in  repeated  sampling  the  various  ranges  given  by  (6.2) 
include  y  with  probability  1  —  P;  this  is  apparent  when  it  is  seen  that  a 
statement  about  a  range  for  y  is  equivalent  to  a  direct  probability  state- 
ment about  |  (or  xx  —  yx2). 

There  is  always  a  finite  probability  of  occurrence  of  an  unlimited 
fiducial  range  for  y,  however  large  the  sample  may  be.  The  interpretation 
of  the  unlimited  range  thus  given  by  some  samples  is  what  needs  to  be 
clarified.  If  (6.2)  is  replaced  by  an  equality  and  solved  for  y,  its  two  roots, 
when  real,  give  fiducial  limits  for  y.  When  the  roots  are  complex,  the 
correct  interpretation  is  that  the  sample  puts  no  limitation  on  the  value  of 
y.  It  has,  however,  sometimes  been  mistakenly  assumed  that  y  has  some 
"fiducial  distribution,"  which  in  such  cases  can  include  complex  values. 
By  referring  back  to  the  original  inequality  (6.2),  which  clearly  involves 
real  xx  and  x2  and  hence  only  real  values  of  y,  we  see  that  this  conclusion  is 
incorrect. 
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We  may  deduce  further  facts  about  the  fiducial  limits  for  y  defined  by 
the  inequality  (6.2).  The  larger  the  sample  and  the  more  accurately 
determined  the  means,  the  smaller  will  be  the  proportion  of  samples 
setting  no  limitation  on  y,  but  there  will  always  be  a  chance  of  such 
samples  occurring.  Now,  clearly,  y  is  certain  to  lie  in  an  unlimited  range, 
so  that,  for  the  set  of  samples  putting  no  limitation  on  y,  the  fiducial 
probability  is  not  1  —  P  but  1 .  Since  the  over-all  fiducial  probability  for 
all  possible  samples  is  1  —  P,  it  follows  that,  for  the  samples  imposing 
some  limitation  on  y,  the  fiducial  probability  must  be  less  than  1  —  P. 
This  result  has  been  established  rigorously  elsewhere  (Neyman,  1954). 
However,  provided  the  experimenter  applies  the  rule  consistently  for  a 
given  fiducial  probability,  the  limits  he  obtains  will  correspond  to  the 
correct  probability. 

The  following  points  about  these  fiducial  limits  may  be  noted. 

(i)  When    xx    does    not    differ    significantly    from    zero    (i.e.,    when 

xx2  <  Ftnjn{n  —  1)),  the  range  for  y  includes  zero, 
(ii)  Similarly,  when  x2  does  not  differ  significantly  from  zero  (i.e.,  when 


x, 


22  <  Ft22jn{n  —  1)),  the  range  includes  infinity,  although  it  excludes  a 

finite  range  of  values  (such  a  range  is  sometimes  termed  exclusive). 

(iii)  When   x-ft11  +  Ix^t12  +  x22t22  <  F\n{n  —  1),   the   range   for   y   is 

unlimited.     These  points  are  brought  out  in  the  diagram  (Figure  6.1). 

Fiducial  limits  of  this  kind  will  occur  from  time  to  time  throughout 
this  book.  For  instance,  when  the  position  of  the  maximum  or  minimum 
of  a  fitted  parabola  is  being  determined  (Section  6.1 1),  we  require  the  ratio 
of  two  regression  coefficients;  fiducial  limits  for  the  position  of  the 
maximum  or  minimum  are  then  found  by  a  method  similar  to  that  just 
shown.  When  the  fiducial  limits  do  not  exist,  the  simple  interpretation  is 
that  there  is  no  evidence  for  the  existence  of  a  maximum  or  minimum. 
When  the  fiducial  limits  include  infinity  (case  (ii)),  this  implies  also  that  the 
extreme  value  may  be  either  a  maximum  or  a  minimum,  that  is,  that  the 
curve  is  not  a  parabola  and  there  is  no  evidence  of  departure  from 
linearity.  In  this  respect,  the  determination  of  fiducial  limits  will  give 
the  same  verdict  as  a  direct  significance  test  on  the  parabolic  regression 
coefficient.    This  is  further  discussed  in  Section  6.11. 

Another  instance  is  in  testing  the  choice  of  the  proportions  in  which 
two  or  more  variables  are  included  in  a  discriminant  function,  or  in 
determining  fiducial  limits  for  these  proportions.  A  discriminant  function 
is  a  linear  compound  of  two  or  more  variables,  chosen  as  a  criterion  to 
distinguish  between  different  groups  of  individuals;  for  a  discriminant 
function  to  be  satisfactory,  its  sum  of  squares  between  groups  for  any 
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sample  must  be  a  large  fraction  of  the  total  sum  of  squares.  The  test  for  a 
given  discriminant  function  will  be  discussed  more  fully  in  Chapter  10. 
It  is  analogous  to  the  test  for  a  ratio,  in  that  it  is  a  comparison  between 
a  mean  square  and  a  residual  mean  square,  each  of  which  is  a  polynomial 


Limits  include  zero 


m  Limits  include  infinity  (exclusive) 
Range  unlimited 


Figure  6.1.     Fiducial  limits  for  the  ratio  y,  corresponding  to  different  values  of  x1  and 
x2,  but  with  variances  and  covariance  fixed. 


function  of  the  unknown  parameter.     Suppose  that  there  are  two  variables, 
x±  and  x2,  and  that  the  discriminant  function  under  test  is 


I  =  #i  +  y%2' 


The  test  is  an  application  of  the  analysis  of  covariance  (see  Chapter  7). 
The  regressions  of  x±  on  £,  within  and  between  groups,  are  determined,  and 
their  difference  is  tested  for  significance.  A  significant  difference  can  be 
shown  to  indicate  a  value  of  y  which  differs  significantly  from  the  optimum 
value,  and  is  thus  inconsistent  with  the  data.  It  will  often  happen  that 
to  no  value  of  y  does  there  correspond  a  significant  difference,  so  that  all 
values  of  y  are  consistent  with  the  data,  and  any  linear  function  of  x1  and 
x2  is  a  satisfactory  discriminant  function. 
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6.3    FIDUCIAL  LIMITS 

In  direct  estimation  from  a  regression  equation,  the  values  of  Y  and 
its  standard  error  are  determined.    Thus,  in  simple  regression 

Y  =  b0  +  bjX 

and  V(Y)  =  A\  +  ^Zm, 

in  tn     J 

so  that  fiducial  limits  YL  for  rj  the  expected  value  of  Y,  are  given  by 

Corresponding  to  a  range  of  values  of  x,  the  fiducial  limits  for  rj 
generate  a  hyperbola,  which  may  be  termed  the  fiducial  boundary  for  the 
values  of  rj  (see  Figure  6.2).  Any  point  between  the  two  branches  of  the 
hyperbola  corresponds  to  an  acceptable  value  of  r\  and  its  counterpart  x. 

Where  inverse  estimation  is  required,  a  value  of  r\  is  given,  and  the 
corresponding  value  of  x  may  be  obtained  from  the  regression  equation : 

X  =  (V  -  b0)lbv 

By  substituting  the  given  value  of  rj  in  the  equation  of  the  fiducial  boundary 
(6.3),  we  get  a  quadratic  equation  for  X.  This  equation  will  give  two 
values  for  X  which  may  be  termed  the  inverse  fiducial  limits  for  X,  given  r\. 
It  will  be  noted  that,  if  the  regression  coefficients  are  accurately 
determined,  their  variance  makes  only  a  negligible  contribution  to  the 
standard  error  of  estimate,  and  hence  to  the  fiducial  range  in  direct  or 
inverse  estimation.  In  simple  regression  the  variance  of  the  regression 
coefficient  enters  into  the  expression  for  the  inverse  fiducial  range  in  a 
term 

which  Finney  (1952)  has  denoted  by  g.    He  remarks  that  if  g  is  less  than 

0.05,  it  may  usually  be  ignored.    Our  object  in  introducing  g,  however,  is 

not  for  purposes  of  approximation  but  to  simplify  the  expressions  for 

fiducial  limits. 

In  multiple  regression  we  shall  find  it  convenient  to  introduce  the 

symbols 

t2s2 

^  =  TTtM'  <6'4) 

bh°i 
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The  matrix  of  which  ghi  is  a  typical  element  will  be  denoted  by  G. 
Although  it  is  readily  derived  from  the  inverse  matrix  77-1,  it  is  nevertheless 
convenient  for  expressing  many  of  the  formulas  that  arise  in  inverse 
estimation. 

In  the  present  instance  the  inverse  fiducial  limits  are 

h{ri  -y)±  ts       —  + - 

V  L  n  tu  ntuJ 


Xl~s+  V  -  tV/tll 


=  x  + 


=  x  + 


hi  V  L  n  tn  ntn. 


V  -  tVltn 


-s)s±liJ 


^+l-ii-g) 


fn 


i-? 


Usually  these  limits  will  be  real  and  will  lie  on  either  side  of  the  estimate 
from  the  regression  equation.  But  sometimes,  as  reference  to  Figure  6.2 
shows,  the  limits  may  be  (i)  both  on  the  same  side  of  the  estimate  or  (ii) 
nonexistent.  The  reason  for  these  effects  has  already  been  discussed  in 
Section  6.2. 

Some  consideration  of  the  properties  of  the  hyperbola  is  instructive. 
First  of  all,  it  is  seen  that  the  asymptotes  are  lines  with  slopes  equidistant 
on  either  side  of  the  slope  of  the  regression  line.  The  slopes  of  these 
lines  represent  the  fiducial  limits  for  the  regression  coefficient  bv  When 
the  standard  error  of  bx  is  so  small  as  to  be  negligible  within  the  range  of 
application  of  the  equation,  the  asymptotes  converge  on  the  regression 
line,  and  the  fiducial  boundaries  become  indistinguishable  from  a  pair  of 
straight  lines  equidistant  on  either  side  of  the  regression  line. 

When  the  standard  error  of  the  regression  coefficient  is  large,  the 
fiducial  boundaries  show  appreciable  curvature;  when  the  regression 
coefficient  is  not  significant  at  the  chosen  level,  the  asymptotes  have  slopes 
of  opposite  sign,  so  that  one  branch  of  the  hyperbola  wift  be  cut  in  two 
places  by  some  lines  parallel  to  the  Z-axis.  Naturally,  we  would  not 
usually  base  fiducial  limits  on  a  nonsignificant  regression  coefficient, 
unless  we  had  prior  reason  for  assuming  that  a  regression  existed.  How- 
ever, sometimes  the  regression  will  be  significant  at  One  level  (say  5  per 
cent)  but  not  at  a  higher  level  (say  1  per  cent);  95  per  cent  fiducial 
boundaries  would  be  satisfactory,  but  99  per  cent  boundaries  would  show 
some  peculiarities. 
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Reference  to  the  figure  will  make  these  results  clear.  It  is  seen  that 
inverse  limits  can  both  lie  on  the  same  side  of  the  estimate  only  when  the 
regression  coefficient  is  not  significant.  The  interpretation  of  the  result 
is  that,  corresponding  to  a  given  value  of  rj,  all  values  of  X  are  acceptable 


y 

T 

Nature  of 

99  per  cent 

limits 

/ 

/ 

/  / 

/ 

f 

1% 

— '^ 

/; 

Exclusive 

5% 

<      y 

i^-" 



— 

Unlimited 
range 

/ 

Ex 

elusive 

0 

/ 

// 

/  / 

/      — 

—    Regre 
95  pe 

ision  line 
r  cent  fiduc 

al  boundari 

es  (inclusive 

) 

d) 

/ 

0  2  4  6  8  10  12  14  16 

y 
Figure  6.2.    Linear  regression  of  y  on  x  and  fiducial  boundaries. 


except  those  in  the  interval  shown.  On  the  side  of  the  interval  away  from 
the  estimate,  the  changed  sign  of  the  fiducial  limit  for  ft  has  its  effect,  in 
reducing  the  significance  of  the  departure. 

The  limits  can  likewise  be  apparently  complex  (actually  nonexistent) 
only  where  the  regression  coefficient  is  not  significant.  This  means  that, 
corresponding  to  the  given  value  of  rj  any  value  of  X  is  acceptable. 
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Needless  to  say,  where  the  regression  coefficient  is  not  significant,  in 
practice  it  would  not  be  used  for  setting  limits  on  X. 

It  will  be  seen  that  if  direct  fiducial  limits  can  justifiably  be  calculated, 
inverse  fiducial  limits  are  not  as  satisfactory ;  they  are  somewhat  wider, 
besides,  of  course,  being  about  a  different  regression  line. 

When  both  variables  are  random  variables,  either  regression  equation 
may  be  calculated,  but  the  appropriate  equation  is  the  regression  of  the 
^predictand  on  the  predictor.  As  might  be  expected,  the  direct  estimate 
from  the  regression  equation  is  more  accurate  than  the  inverse  estimate 
from  the  alternative  equation ;  this  is  a  direct  consequence  of  the  theory 
of  least  squares.  However,  an  alternative  demonstration  may  be  helpful. 
For  instance,  even  when  Y  —  y,  at  which  point  the  variance  and  hence 
the  fiducial  limits  of  a  direct  estimate  would  be  independent  of  blt  the 
limits  for  the 
when  r\  —  y, 


giving 


the  term  in  g  accounting  for  the  variance  in  bv 

The  wider  limits  are  the  price  that  has  to  be  paid  for  using  the  regression 
equation  for  estimating  the  variable  whose  role  is  really  that  of  estimator. 
Nevertheless,  inverse  estimation  is  the  best  that  can  be  done  in  many 
situations,  in  which  to  use  direct  estimation  would  be  invalid  and  would 
lead  to  results  of  spurious  accuracy. 


6.4    TOLERANCE  LIMITS 

The  fiducial  limits  for  r\  discussed  in  the  previous  section  give  the  limits 
within  which  the  true  relationship  is  likely  to  lie.  Very  often  the  experi- 
menter is  interested  also  in  the  limits  within  which  actual  values  of  y, 
corresponding  to  a  given  value  of  x,  may  lie.  Such  limits,  which  will 
naturally  be  wider  than  the  fiducial  limits,  will  be  called  tolerance  limits. 

Tolerance  limits  are  especially  important  for  inverse  estimation,  for 
here  we  often  have  given  an  observed  value  of  y  (not  of  rj)  and  need  to  be 
able  to  set  limits  on  x.  Indeed,  inverse  fiducial  limits  are  seldom  required, 
because  rarely  is  the  regression  value  r\  given. 


erse  estimate  depend  on  the  variance  of  bv 

We  have 

V(*i  -  *)2  =  ^2 

ri     (xL-xf~ 
ji         tn     _ 

> 

Y    -  x   1                  U 

VW  -  t*s*itn)] 

=  x± 

ts 
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The  estimated  variance  of  a  single  value  y'  of  y  about  the  regression  line 


is 


(1  +  I  +  (^ 

L        n  U. 


Hence,  just  as  for  fiducial  limits,  direct  tolerance  limits  for  values  of  y' 
corresponding  to  a  given  value  of  x  may  be  determined.     The  limits  are 


IT        1       (x  -  xf\ 


These  limits  will,  for  varying  values  of  x,  generate  a  tolerance  boundary 
which  is  a  hyperbola  with  the  same  asymptotes  as  the  fiducial  boundary. 
Inverse  limits  may  also  be  calculated ;  thus,  corresponding  to  an  observed 
value  y'  of  y,  we  have 


(JT  -  x)t2s 


Z\t2c2 


X'  + 


»M 


i        1       (X'  -  xf 

1  +  -  +  - 

n  tn       . 


(n  +  l)faj*| 


nU 


(Xf  -  x)g  ± 


=  r  + 


\X'  -  xf 


lll 


'SirA 


(6.5) 


*-g 


where  X'  is  the  estimate  corresponding  to  y' . 

More  generally,  we  may  have  m  values  of  y  observed  at  some  fixed  but 
unknown  value  of  x.  The  estimated  variance  of  the  mean  y'  about  the 
regression  line  is  then 

afl    ,   1   ,  (X'-£F 

s2  -  +  -  + , 

jn      n  tn 

so  that  the  inverse  tolerance  limits  for  the  value  of  x  will  be 


Xr    =    X'  + 


1-g 


When  m  is  large,  the  inverse  tolerance  limits  will  approach  the  inverse 
fiducial  limits  discussed  in  Section  6.3. 

It  should, be  mentioned  that,  strictly  speaking,  the  "inverse  tolerance 
limits"  discussed  here  are  really  not  tolerance  limits  at  all  but  fiducial 
limits,  since  x  may  be  a  fixed  variable.  However,  it  is  convenient  to 
retain  the  term  "inverse  tolerance  limits,"  as  a  reminder  of  the  fact  that  the 
limits  are  based  on  the  tolerance  boundary  for  the  regression  line  and  not 
on  the  fiducial  boundary. 
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Example  6.1  Inverse  Estimation  of  Sodium  Concentration  from 
Flame  Photometer  Reading.  As  an  application  of  the  formulas  given 
here  we  may  consider  the  estimation  of  sodium  concentration  by  means  of  the 
scale  reading  on  a  flame  photometer.  The  data  are  presented  and  the  analysis 
is  carried  out  in  Table  2.1. 

The  regression  equation  there  given  is 

Y  =  -0.89  +  0.416*, 

where  x  is  the  sodium  concentration  and  y  the  photometer  reading.    To  be 
used  for  estimating  x,  the  equation  becomes 

X  =  2.14  +  2A04y. 

Usually  the  sodium  concentration  will  be  estimated  from  a  single  reading  of  the 
photometer,  so  that  inverse  tolerance  limits,  given  by  equation  (6.8),  are  required. 
For  this  example,  to  find  the  99  per  cent  tolerance  limits,  we  have 

/  =  3.499 

s  =  0.948 

b  =  0.416 

tn  =  37,500, 

whence  g  =  0.0017. 

Corresponding  to  y  =  80,  the  estimate  of  x  is  194.4,  and  the  inverse  tolerance 

limits  are 

,ca  „       69.4  x  0.0017  ±  (3.499/0.416)  x  0.948  V(0.13  +  1.11) 

194,4  -4 

0.9983 

=  185.6  and  203.4. 

In  this  example  the  asymmetry  introduced  by  g  is  small,  and  no  great  error 
would  result  from  its  neglect. 

For  a  regression  line  passing  through  the  origin  the  fiducial  limits  are  a  pair 
of  lines  also  passing  through  the  origin.  Thus,  in  Example  2.1,  the  line  fitted 
through  the  origin  is 

Y  =  0.4104*; 

the  fiducial  limits  for  the  regression  coefficient  are 

0.4025  and  0.4183, 
so  that  the  fiducial  boundary  is  the  pair  of  straight  lines 

Y  =  0.4025* 
and                                               Y  =  0.4183*. 

The  inverse  tolerance  limits  are  given  by 


'*&/(■♦*-') 


where  tn'  is  the  uncorrected  sum  of  squares  of  x,  and 

8  ~~  h  ,2t  '  ' 
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In  the  example,  since 

g'  is  negligible,  and  the  inverse  tolerance  limits  are  simply 

*-'*S/('.+5)- 

Thus,  for  y  =  80,  the  estimate  of  x  is  194.9  and  the  inverse  tolerance  limits  are 
found  to  be 

*«;,«      3-499  x  °-948    ,1  -M-* 
l94'9  ±        0.4104        VL213 

=  186.0  and  203.8. 

We  note  that,  although  the  values  of  the  limits  are  altered  slightly,  the  range 
differs  little  from  that  based  on  the  regression  not  restricted  to  passing  through 
the  origin.  This  is  so  because  the  point  considered  is  distant  from  the  fixed 
point,  the  origin. 

6.5    EXTENSION  TO  MULTIPLE  REGRESSION 

The  methods  shown  above  for  the  determination  of  fiducial  and 
tolerance  limits  in  simple  regression  are  easily  applied  to  multiple  regression 
problems.    With  the  usual  notation,  the  regression  equation  is 

Y  =  b0  +  Vi  +  ^2^2  H r-  bpxv, 

the  variance  of  Y  being  estimated  as 


V(Y)  =  s2 


n       h  i  J 


h    i 

Fiducial  limits  for  r\  are  given  by 

Y±t<s/V{Y). 

In  the  same  way,  tolerance  limits  for  y  are  given  by 

Y±tV[s2+  V(Y)]. 

The  determination  of  inverse  fiducial  or  tolerance  limits  introduces  some 
new  features.  Usually  what  is  required  is  a  pair  of  limits  for  one  of  the 
independent  variables,  corresponding  to  given  values  of  r\  (or  y),  and  the 
given  values  of  each  of  the  other  independent  variables.  In  this  case  the 
method  is  similar  to  that  used  for  simple  regression.  However,  sometimes 
simultaneous  limits  for  two  or  more  of  the  independent  variables  may  be 
required.  For  example,  in  some  experiments  on  the  feeding  of  supple- 
ments of  molybdenum  and  sulphate  to  sheep  to  determine  the  effect  of  the 
supplements  on  the  copper  stored  in  the  liver,  it  was  required  to  determine 
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what  levels  of  each  supplement  corresponded  to  zero  storage.  On  the 
assumption  that  the  regression  of  copper  storage  y  on  level  of  supplement 
is  linear,  the  relationship  between  molybdenum  level  (z-,)  and  sulphate 
level  (x2)  estimated  to  give  zero  storage  will  also  be  linear;  the  fiducial 
boundary  for  Xx  and  X2  will  again  be  a  hyperbola.  For  this  particular 
example,  the  relationship  between  Xx  and  X2  will  be 

60  +  b1X1  +  b2X2  =  0,         or  y  +  b1(X1  -  x±)  +  b2(X2  -  x2)  =  0. 

The  fiducial  boundary  for  Xx  and  X2,  referred  to  the  point  of  means  as 
origin,  will  be  given  by  the  equation 

(y  +  hXx  +  b2X2f  =  /v(i  +  t^X*  +  2t12X±X2  +  t^X^  .  (6.6) 
This  equation  may  be  written 

V^i2d  "  fti)  +  Ib^X^il  -  g12)  +  bfXftl  -  g22) 
+  2y(bxXx  +  b2X2)  +  f  -  ^  =  0, 

where  the  ghi  are  the  quantities  defined  in  Section  6.3,  equation  (6.4). 
Its  curve  is  a  hyperbola  with  asymptotes  through  the  point 


with  slopes 


l  (gl2  ~  g22\  V_  tgi2-gn\\ 

M       A      /'        b2\       A      ]\ 

_  h  [l-fti±VA\  ■ 
b»\       1  —  £00       / 


where  A  =  (1  -  g12f  -  (1  -  gu)(l  -  g22). 

Again,  once  the  fiducial  boundary  (6.6)  has  been  defined,  there  is  no 
reason  why  simultaneous  fiducial  limits,  corresponding  to  given  values  of 
any  of  the  variables,  may  not  be  written  down.  Indeed,  we  could  find 
simultaneous  limits  for  r\  (or  y)  and  one  of  the  xt,  which  would  again  gene- 
rate a  hyperbola ;  these  limits  would  be  both  direct  and  inverse.  Practical 
examples  of  such  simultaneous  limits  are  lacking. 

Example  6.2  Inverse  Estimation  of  Alkali  Requirement  for  Pulping 
Wood  to  a  Given  Lignin  Content.  Another  example  where  inverse  fiducial 
limits  are  required  comes  from  pulping  studies  of  eucalypt  woods  reported  by 
Cohen  and  Mackney  (1951).  The  object  is  to  determine  a  treatment  which 
produces  pulp  of  a  required  lignin  content.  The  percentage  of  the  wood 
material  soluble  in  hot  water  (hot-water  solubles,  xj  was  determined  for  each 
wood  sample,  which  was  then  divided  into  four  parts,  each  being  pulped  with 
varying  amounts  of  active  alkali  (x2  per  cent).    The  same  levels  are  repeated 
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for  each  sample,  so  that  the  two  independent  variables,  xx  and  x2,  are  uncorre- 
lated.  The  lignin  content  of  the  resulting  pulp  was  measured  in  terms  of  a 
"permanganate  number,"  which  is  known  empirically  to  be  roughly  related  to 
lignin  content,  and  its  logarithm  (y)  to  base  10  was  taken  as  the  dependent 
variable.    The  data  are  shown  in  Table  6.1. 


TABLE  6.1 

Values  of  Percentage  Hot- Water  Solubles  (xj,  Percentage 

Active  Alkali  Used  in  Pulping  (x2),  and  Log  Permanganate 

Number  (y)  for  Specimens  of  Eucalypt  Wood 


xl 

x2 

y 

xi 

x2 

y 

xi 

x2 

y 

5.97 

15 

1.425 

6.79 

15 

1.498 

13.19 

15 

1.734 

17 

1.250 

17 

1.330 

17 

1.535 

19 

1.170 

19 

1.233 

19 

1.326 

21 

1.124 

21 

1.161 

21 

1.201 

8.00 

15 

1.641 

9.20 

15 

1.442 

9.52 

15 

1.500 

17 

1.418 

17 

1.255 

17 

1.281 

19 

1.230 

19 

1.146 

19 

1.152 

21 

1.164 

21 

1.093 

21 

1.104 

8.51 

15 

1.655 

10.00 

15 

1.507 

9.46 

15 

1.610 

17 

1.384 

17 

1.332 

17 

1.425 

19 

1.334 

19 

1.220 

19 

1.283 

21 

1.164 

21 

1.199 

21 

1.204 

4.51 

15 

1.486 

10.94 

15 

1.667 

3.17 

15 

1.204 

17 

1.272 

17 

1.458 

17 

1.130 

19 

1.185 

19 

1.258 

19 

1.083 

21 

1.124 

21 

1.173 

21 

1.004 

3.15 

15 

1.250 

6.35 

15 

1.391 

3.53 

15 

1.236 

17 

1.146 

17 

1.207 

17 

1.149 

19 

1.086 

19 

1.100 

19 

1.061 

21 

1.033 

21 

1.079 

21 

1.025 

Total 


15  22.246 

17  19.572 

19  17.867 

21  16.852 


Means 


7.486     18 


.2756 


Grand  total 


76.537 
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It  is  appropriate  in  this  example  to  determine  the  regression  of  y  on  x1  and  x2. 
However,  what  is  required  is  to  determine  an  estimate  of,  and  inverse  fiducial 
limits  for,  the  alkali  requirement  X2,  which  will  result  in  a  given  lignin  content 
r],  when  the  hot-water  solubles  figure  xx  is  known. 

Since  in  this  example  the  independent  variables  are  uncorrelated,  the  simple 
and  the  partial  regression  coefficients  are  the  same.  Table  6.2  shows  the 
calculation  of  these  regression  coefficients  and  the  sums  of  squares  for  regression. 


TABLE  6.2 

Calculation  of  Regression  Coefficients  from  Values  in  Table  6.1 


Sum  of 

Squares 

Sum  of  Products 
with  y 

Regression  Coefficient 

Regressic 
Sum  of  Squ 

x±         516 
x2         300 

315 

15.5292 
-17.887 

+0.030  077  ±  0.0034 
-0.059  623  ±  0.0044 

0.4671 
1.0665 

1.5336 

TABLE  6.3 

Analysis 

of  Variance 

D.F.             Sum  of  Squares              Mean  Square 

Regression 
Residual 

2 
57 

1.5336 
0.3308 

0.766  8** 
0.005  804 

Total  59  1.8644 

*  *  Significant  at  1  per  cent  level 

The  analysis  of  variance,  in  Table  6.3,  shows  that  the  regression  effects  are 
highly  significant  and  that  the  residual  variance  is  0.005  804. 
The  regression  equation  is 

Y  =  2.123  +  0.0301^  -  0.0596*2 

and  the  99  per  cent  fiducial  boundary  is  given  by  the  equation 

{YL  -  2.123  -  0.0301^  +  0.0596*2)2 

=  7.102  x  0.005  804 


J_      Qj  -  7.486)2       (x2  -  18)2 


60  516.315  300       j 

The  lignin  content  required  for  the  pulp  corresponds  to  a  "permanganate 
number"  of  15  (i.e.,  r\  —  1.176);  this  value  in  the  equation  gives  the  relationship 

X2  =  15.89  +  0.504*!, 

so  that,  once  the  hot-water  solubles  percentage  is  given,  the  requirement  of 
active  alkali  can  be  estimated. 
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6.6    SOME  DIFFICULTIES 

Sometimes  the  independent  variable  is  represented  as  a  polynomial  or 
similar  function  of  the  expected  value  of  the  dependent  variable,  rather 
than  vice  versa;  in  such  cases  it  is  tempting  to  determine  the  regression 
of  the  independent  on  the  dependent  variable.  For  instance,  in  an 
experiment  on  the  calibration  of  a  Stormer  viscometer,  liquids  of  varying 
viscosity  were  prepared,  and  the  time  taken  for  100  revolutions  of  the 
inner  cylinder  of  the  apparatus  was  recorded.  The  theoretical  relationship 
between  viscosity  (a^)  and  time  (y)  is 

x1  =  a1Y-  a_x\Y, 

which  is  not  linear  in  Y.  It  is  not,  however,  valid  to  determine  the 
regression  of  xx  on  y,  since  xx  is  not  a  random  variable.  It  is  necessary  to 
rewrite  the  equation  in  the  form 

Y  =  xi  +  V(xi2  +  4a1a_1) 
2ax 

or  r^fo  +  vW  +  c)] 

and  to  fit  it  by  means  of  the  methods  of  Chapter  4.  Fortunately,  in 
the  experiment  considered,  it  was  found  that  a_x  (or  c)  was  not  significant. 
As  a  satisfactory  relationship  could  be  established  with  c  =  0,  this  value 
was  taken,  bringing  the  relationship  back  to  a  simple  proportionality. 
The  modified  relationship  is  fitted  to  some  data  in  Example  4.2. 

Although  significance  tests  and  fiducial  limits  on  the  constants  of  a 
nonlinear  relationship  are  only  approximate,  they  are  nevertheless  valid, 
as  far  as  the  methods  of  analysis  can  make  them  so.  The  fitting  of  the 
inverse  relationship  on  the  grounds  that  it  is  computationally  convenient 
will,  however,  give  misleading  results. 

6.7    ESTIMATION  FROM  NONLINEAR  EQUATIONS 

We  have  already  seen  in  Chapter  4  how  equations  that  are  not  linear  in 
their  parameters  may  be  fitted  and  that  a  computational  method  can  be 
devised  which  is  effectively  equivalent  to  the  fitting  of  a  linear  regression 
on  the  nonlinear  function  and  its  first  derivative  with  respect  to  the 
nonlinear  parameter.  Although  it  must  be  borne  in  mind  that  all  standard 
errors  and  significance  tests  in  this  work  are  approximate,  nevertheless, 
satisfactory  results  may  be  obtained  by  following  the  procedure  outlined 
in  Chapter  4. 
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The  regression  equation  we  consider  is 

Y^bo  +  bjfac), 

where  b0,  bl9  and  c  are  coefficients  estimated  from  the  data.  The  variance 
of  Y  may  be  found  approximately  by  taking  differentials,  squaring,  and 
taking  expectations : 

dY=db0+fdb1  +  b1f'dc 

^dbo+fdbi+f'dbz, 

where  we  put  bxdc  —  db2,  after  the  manner  of  Chapter  4.  From  this 
point  the  variance  of  estimate  is  found  in  exactly  the  same  way  as  from 
linear  equations.     We  find,  for  the  estimated  variance, 


V(Y) 


l-  +  (f-fft11  +  2(/  -/)(/'  -/>12  +  (/'  -/')2'22} 


Since  the  quantities  tM  and  s2  have  already  been  calculated  for  the 
determination  of  the  regression  equation,  the  variance  calculation  is  no 
more  troublesome  than  it  would  be  had  the  equations  been  linear. 

Example  6.3  Variance  of  Estimated  Change  in  Total  Liver  Copper 
(Example  4.1).  Suppose  that  the  estimated  change  in  total  liver  copper, 
corresponding  to  a  daily  molybdenum  intake  of  10  mg.,  is  required. 

From  the  equation  we  have 

Y10  =  -4.85  +  65.Se-1-66 
=  -4.85  +  65.3  x  0.1902 
=      7.6 
/-/=      0.1902  -0.3627  =  -0.1725 
/'-/'=  -1.902    +0.745    =-1.157 
V(Y10)  =    46.20(4  +  0.17252  x  1.2505  +  2  x  0.1725  x  1.157  x  0.19311 

+  1.1572  x  0.316  889) 

=    32.58 

S.E.(r10)   =      5.7. 

6.8    SEPARATING  OUT  EFFECTS  BY  REGRESSION 
ANALYSIS 

In  many  of  the  practical  applications  of  regression  analysis,  the  value  of 
the  technique  lies  not  so  much  in  enabling  one  variable  to  be  predicted 
from  another  as  in  assessing  the  magnitude  of  the  effects  of  one  or  more 
factors,  and  in  separating  out  the  contributions  of  each  factor.  In  such 
applications  the  regression  coefficients  are  of  interest  in  themselves,  and 
not  merely  for  providing  the  coefficients  in  an  equation  of  prediction. 
For  example,  in  an  industrial  process,  it  may  not  be  possible  to  determine 
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directly  the  effects  of  certain  process  variables  on  the  quality  of  the 
finished  product.  However,  if  factory  records  provide  values  of  each  of 
these  process  variables  from  time  to  time  and  also  the  corresponding 
values  of  the  product  quality,  regression  analysis  may  be  used  to  derive 
estimates  of  the  effect  of  each  variable. 


6.9    THE  "DAILY  TOTAL"  METHOD 

In  studying  costs  in  logging  and  sawmilling  in  the  timber  industry,  it 
has  been  found  too  laborious  to  record  times  and  other  information  for 
individual  logs.  Even  when  this  has  been  done,  the  problem  of  allocating 
lost  time,  time  for  repairs,  and  other  costs  has  not  been  satisfactorily 
solved.  An  alternative  method  of  determining  costs  has  been  to  record 
the  daily  totals  of  each  item  and  to  determine  the  relation  between  cost 
and  other  factors  as  a  regression  equation.  For  example,  Hasel  (1946) 
has  in  this  way  derived  an  estimate  for  logging  cost  in  terms  of  tree  size 
and  intensity  of  cutting.  For  his  records  he  has  used  not  observations  on 
individual  trees  but  total  daily  cost  and  daily  totals  of  numbers  of  logs  in 
each  size  class,  quantities  that  would  normally  be  recorded  in  any  case. 
He  is  thus  able  to  derive  information  on  costs  without  the  labor  of 
collecting  individual  records.  Schumacher  and  Jones  (1940)  and  Littler 
and  Adkins  (1954)  have  applied  this  "daily  total"  method  of  determining 
the  relation  between  milling  cost  and  dimensions  of  logs  in  studies  of 
sawmill  economics. 

The  daily  total  method  simplifies  the  work  of  the  mill  study  because 
records  can  be  kept  without  interfering  with  the  operation  of  the  mill 
(by  time  keeping,  etc.).  The  total  cost  may  be  reckoned  as  proportional 
to  total  time  worked ;  total  output  only,  and  not  output  from  individual 
trees  or  logs,  need  be  recorded.  Indeed,  the  daily  total  method  may  be 
considered  to  give  a  more  realistic  picture  of  mill  or  logging  operation 
than  a  detailed  study  of  individual  logs.  If  it  is  less  accurate  because  it 
records  fewer  details,  it  also  requires  a  smaller  study  crew,  so  that  for  a 
study  of  the  same  cost  more  mills  can  be  studied,  or  more  days  spent  at 
the  one  mill. 

An  interesting  application  of  a  method  the  same  in  principle  as  the 
daily  total  method  is  given  by  Day  (1937).  She  has  derived  equations  for 
estimating  the  cost  of  hauling  logs,  in  terms  of  the  diameter  and  diameter 
squared.  Seventy  truckloads  of  logs,  for  each  of  which  the  haulage  cost 
was  the  same,  were  examined.  For  each,  the  number  of  logs  and  the 
totals  of  diameters  and  diameters  squared  were  recorded.  A  relation 
giving  cost  was  determined  by  regression  analysis.  The  success  of  the 
method  is  indicated  by  the  fact  that,  using  the  empirical  formula,  the 
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haulage  cost  of  any  truckload  was  estimated  with  a  standard  error  of  10 
per  cent. 

Needless  to  say,  the  applications  of  this  and  similar  methods  are  not 
confined  to  forestry  operations  and  should  have  wide  usefulness  in  factory 
operations  and  costing. 

6.10    ESTIMATION  OF  ZEROS,  ETC.,  OF  A 
FITTED  CURVE 

Having  fitted  a  polynomial  regression,  we  are  often  interested  in 
determining  the  value  of  the  independent  variable  for  which  the  estimate 
of  the  dependent  variable  vanishes,  that  is,  in  finding  the  zeros  of  Y.  It  is 
just  as  easy  to  find  the  values  of  a;  for  which  Y  takes  any  other  given  value, 
say  a,  for  we  then  seek  simply  the  zeros  of  Y  —  a. 

This  problem  is  one  of  inverse  estimation:  given  a  value  of  the 
dependent  variable,  to  find  the  corresponding  values  of  the  independent 
variables.  It  differs  from  inverse  estimation  in  multiple  regression, 
however,  in  that  the  required  values  of  the  independent  variable  are  a  set 
of  points  rather  than  a  single  point,  line,  or  plane,  etc. 

If  the  regression  equation  is 

Y=b0  +  b1x  +  b2x2  +  •  •  *  +  bvx\ 

the  zeros  are  the  roots  of  the  equation 

b0  +  bjx  +  b2x*  +  *  *  *  +  b^  =  0.  (6.7) 

This  equation  will  have  p  roots  in  general,  but  an  even  number  of  them 
may  be  complex.  From  a  practical  point  of  view,  the  complex  roots  are 
of  no  interest.  It  may  happen,  though,  that  the  roots  found  complex  in 
the  sample  correspond  to  real  roots  in  the  population;  that  is,  they  are 
complex  as  a  result  of  sampling  errors  in  the  coefficients  of  the  equation 
(6.7).  Hence  it  is  of  interest  to  determine  fiducial  limits  for  the  zeros  of 
the  equation.  If  the  fiducial  limits  found  are  real,  even  when  the  root 
itself  is  complex,  there  is  evidence  for  the  existence  of  a  couple  of  real 
roots.  This  will  be  clear  from  inspection  of  Figure  6.3,  which  gives  a 
parabola  and  its  fiducial  boundaries,  placed  in  varying  relationship  to  the 
X-axis.  In  (i)  we  see  a  curve  with  real  zeros,  each  of  which  has  real 
fiducial  limits.  In  (ii),  the  curve  has  real  zeros,  but  one  pair  of  fiducial 
limits  is  complex;  in  practice  this  means  that  the  inner  limits  coalesce, 
and  there  is  also  the  possibility  that  the  zero  does  not  exist  (i.e.,  the 
population  roots  are  not  real).  In  (iii),  the  curve  has  no  real  zeros,  but  a 
pair  of  fiducial  limits  exists,  indicating  the  possibility  of  real  zeros  in  the 
population.  In  elucidating  such  a  problem,  a  diagram  such  as  that  given 
is  often  more  satisfactory  than  a  purely  algebraic  discussion. 
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6.11     MAXIMA,  ETC.,  OF  A  FITTED  CURVE 

Sometimes  a  parabolic  or  other  polynomial  curve  is  fitted  to  data  with 
the  object  of  determining  the  maximum  or  minimum  value  taken  by  the 
dependent  variable  and  the  value  of  the  independent  variable  for  which 
it  occurs.  A  detailed  account  of  the  determination  of  maxima  is  given  by 
Hotelling  (1941),  whose  paper  is  a  model  of  investigation.  Hotelling 
discusses  the  question  of  allocation  of  experimental  points  for  most 
accurate  estimation  of  maxima ;  this  is  beyond  the  scope  of  this  book,  for 
we  consider  only  the  estimation  of  the  maximum  from  a  regression  equation 
of  given  form,  determined  from  a  given  set  of  experimental  points. 
Hotelling's  work  is  concerned  solely  with  the  efficient  use  of  the  regression 
relation  to  determine  a  maximum.  Very  often  the  object  of  the  experi- 
menter is  not  so  specialized;  the  position  and  value  of  the  maximum, 
although  important,  are  not  the  only  useful  items  of  information  to  be 
drawn  from  the  regression  equation. 

For  example,  in  an  agricultural  experiment  it  may  be  considered  that 
there  is  an  optimum  level  of  fertilizer  giving  maximum  yield  of  a  crop,  and 
that  levels  of  fertilizer  above  the  optimum  give  reduced  yields.  A  simple 
way  of  testing  this  supposition  would  be  to  fit  a  parabola  to  the  relation  of 
yield  to  fertilizer  level.  A  significant  negative  quadratic  term  in  the 
parabola  would  indicate  the  existence  of  a  maximum  yield  corresponding 
to  the  optimum  fertilizer  level. 

The  optimum  value  of  the  independent  variable  is  readily  estimated. 
If  the  equation  is 

Y  =  b0  +  bxx  +  b2x2, 
the  maximum  (or  minimum)  value  of  Y  occurs  when 
dY 


=  bx  +  2b2x  =  0, 
ax 


so  that  the  estimate  is 
the  value  of  Y  then  being 


*m  =   -A/2&2 


b0  -  b*\Ab2. 


This  is  a  maximum  if  b2  is  negative,  a  minimum  if  b2  is  positive.  In 
what  follows  we  shall  always  speak  of  maxima,  although  the  same 
principles  apply  to  the  determination  of  minima. 

Since  xm  is  a  ratio  of  two  regression  coefficients,  its  variance  may  be 
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found  approximately  in  terms  of  the  variances  of  bx  and  b2.  We  have, 
approximately, 

/  jll  ft  ?12         fr  2j22\ 

V(XJ  =  S\~4bi  ~2b?  +  W/  ' 

Using  this  variance,  we  may  calculate  approximate  fiducial  limits  for  the 
value  of  the  optimum.    These  limits  are 

XL  =  xm[\  ±  Vfei  -  2g12  +  g22)]. 

Alternatively,  we  may  determine  exact  fiducial  limits  by  the  method  given 
in  Section  6.2  for  the  fiducial  limits  of  a  ratio.  The  fiducial  limits  are  the 
roots  of  the  equation 

(2b2X  +  bxf  =  t2s2(4t22X2  +  4t12X  +  t11), 

which  may  be  written 

(X  -  xmf  =  g22X2  -  2g12xmX  +  gllxm2 

and  has  the  solutions 

__-/l--  gi2  ±  Vtd  -  guf  -  (1  -  ftiXl  -  gn)]\ 


L 


xJy- 


1—^99  J 


For  the  maximum  value  of  Y,  which  we  write  Ym,  we  can  determine  only 
an  approximate  variance,  and  hence  approximate  fiducial  limits.     We  have 

Ym  =  b0-  b2j4b2 

=  V  —  b^i  —  b2x2  —  b^j4b2, 
where  x2  =  Sx2jn. 

Hence,  taking  differentials, 

dYm  =  dy-  (xx  +  bx\2b2)  dbx  -  (x2  -  b2l*b2)  db2 
=  dy  -  {xx  -  xj  dbx  -  (x2  -  xm2)  db2. 

On  squaring  both  sides  and  taking  expectations,  we  find  the  approximate 
variance,  whose  estimate  from  the  sample  is 

V(YJ  =  s2  I  +  (x,  -  xmyt*  +  2{xx  -  xj(x2  -  xm2)t™ 

+  fe  -  O2?22]  •    (6.8) 

As  explained  in  Section  6.2,  the  fact  that  the  fiducial  limits  for  the  value 
of  xm  include  infinity  (or,  as  it  is  sometimes  expressed,  the  limits  are 
exclusive)  indicates  that  there  is  no  evidence  for  the  existence  of  a  maximum ; 
in  other  words,  the  coefficient  b2  is  not  significant  at  the  level  of  significance 
chosen. 
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6.12    EXAMPLE  OF  ESTIMATING  A  MAXIMUM 

Example  6.4  The  pH  of  Ret  Liquor  at  Which  the  Rate  of  Retting 
is  Maximized.  One  method  of  retting  flax  is  to  steep  it  in  water  for  several 
days,  so  that  bacteria  selectively  attack  the  straw,  removing  the  material  binding 
the  fiber  bundles  together.  The  rate  at  which  the  retting  takes  place  is  measured 
by  the  change  in  buffer  capacity  of  the  ret  liquor;  it  is  found  that  the  rate 
increases  with  increasing  pH  of  ret  liquor  and  then  falls  off  as  pH  increases  still 
further. 

The  data  analyzed  to  determine  the  maximum  rate  came  from  an  experiment 
in  which  flax  at  various  stages  of  maturity  was  retted,  there  being  in  all  fourteen 
rets ;  the  pH  and  buffer  capacity  rate  (among  other  things)  were  measured  at 
five  times  during  the  ret.  Although  the  observations  at  different  times  may 
not  be  independent,  the  data  are  nevertheless  satisfactory  for  determining  the 
maximum;  the  disturbance  to  the  fiducial  limits  is  not  likely  to  be  serious. 
The  quadratic  regression  of  rate  (y)  on  pH  (x)  was  determined  from  the  sums  of 
squares  and  products  for  the  interaction  of  rets  and  times,  with  52  degrees  of 
freedom.  The  relevant  sums  of  squares  and  products  are  set  out  in  Table  6.4, 
the  inverse  matrix  and  the  regression  coefficients  with  their  standard  errors  in 
Table  6.5.  The  analysis  of  variance,  the  purpose  of  which  is  to  provide  the 
residual  mean  square  from  which  to  estimate  the  standard  errors  of  the  regression 
coefficients,  is  given  in  Table  6.6. 

TABLE  6.4 

Sums  of  Squares  and  Products  for  Determining  Quadratic 

Regression  of  Buffer  Capacity  Rate  (y)  on  pH  (x) 


X 

X* 

y 

0.6802 
6.5126 

6.5126" 
62.3770_ 

-0.094  56 
-0.921  14 
0.077  998 

TABLE  6.5 
Inverse  Matrix  and  Regression  Coefficients 


U92.950 

-437.773  6 

6.765  42  ±2.1 

437.7736 

45.722  69 

-0.721  127  ±  0.22 

TABLE  6.6 
Analysis  of  Variance  of  Buffer  Capacity  Rate 

D.F.  Sum  of  Squares  Mean  Square 

Regression  2  0.024  521  0.012  260 

Residual  50  0.053  477  0.001070 


Total  52  0.077  998 
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Since  the  quadratic  regression  is  significant  at  the  1  per  cent  level,  it  is  appro- 
priate to  calculate  the  position  of  the  maximum  rate  and  its  99  per  cent  fiducial 
limits.    We  have 

xm  =  -bJ2b2  =  6.765/(2  x  0.7211) 

=  4.69. 

The  fiducial  limits  are  found  as  follows : 
For  50  degrees  of  freedom, 

F  =  7.171 

so  that  Fs2  =  0.007  669. 

From  this  and  the  matrix  T-1  and  the  regression  coefficients  given  in  Table  6.5 
we  derive  the  matrix  G  (Table  6.7).    The  fiducial  limits  are,  then, 


4.69 


0.3118  ±  V(0.31182  -0.2974  x  0.3257)1 
0.3257  J 


-4  69  X0'899 
4,f)y  X  1.015 

=  4.22  and  4.76. 

TABLE  6.7 

The  Matrix  G 

"0.7026        0.6882 

0.6882        0.6743 

From  the  values  of  the  ghi  it  is  clear  that  the  approximate  fiducial  limits  will  be 
inaccurate.    The  approximate  limits  are,  in  fact, 

4.58  and  4.80, 

which  are  clearly  unsatisfactory. 

In  this  example  the  actual  maximum  rate  will  vary  from  ret  to  ret  and  is  not  of 
interest.  However,  were  its  standard  error  required,  this  would  be  given 
approximately  by  formula  (6.8),  with  n  put  equal  to  5  and  xx  and  x2  taking 
the  values  of  the  mean  for  the  ret  considered. 


6.13    EXTENSION  TO  HIGHER  DERIVATIVES 

An  interesting  example  of  how  these  ideas  may  be  extended  arises  in 
experiments  on  the  retting  of  flax  similar  to  that  discussed  in  Example  6.4. 
The  buffer  capacity  of  the  retting  liquor  is  taken  as  a  measure  of  the  stage 
the  ret  has  reached ;  the  rate  of  change  of  buffer  capacity  is  accordingly 
considered  to  measure  the  rate  of  retting.  In  studying  the  progress  of  the 
ret,  it  is  of  interest  to  know  when  the  retting  rate  reaches  its  maximum. 
Rather  than  the  buffer  capacity  rate,  we  may  take  the  buffer  capacity 
itself  as  the  dependent  variable  and  determine  its  regression  on  time.     If 
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this  may  be  represented  by  a  polynomial,  the  time  of  the  maximum  rate  of 
retting  may  be  found  as  the  time  when  the  second  derivative  of  this 
polynomial  vanishes.    Thus,  if  the  equation  is 

Y  =  b0  +  bxx  +  b2x2  +  b3x* 

where  x  is  the  time  and  y  the  buffer  capacity,  then 

d2Yldx*  =  2b2  +  6b3x. 

The  time  of  maximum  rate  of  retting  is  therefore 

xm  =  —b2l3b3. 

The  treatment  of  this  problem,  including  the  determination  of  fiducial 
limits  for  xm  and  the  maximum  rate,  follows  exactly  the  same  lines  as  that 
of  estimating  a  maximum. 

6.14    MAXIMA  OF  POLYNOMIAL  CURVES 

The  maxima  of  polynomial  curves  in  general  may  be  estimated  in  the 
same  way  as  the  maximum  of  a  parabola,  but  since  there  is  in  general 
more  than  one  maximum  (or  minimum),  and  because  the  equation  for  the 
maximum  may  yield  complex  roots,  the  general  case  needs  separate 
discussion. 

The  principles  are  made  sufficiently  clear  from  the  discussion  of  an 
equation  of  the  fourth  degree.     Suppose  that  the  regression  equation  is 

Y  =  b0  +  bjx  +  b2x2  +  btf?  +  V4- 

Then  the  equation  for  the  position  of  the  maximum  is 

b±  +  2b2x  +  3b3x2  +  4  V3  =  0.  (6-9) 

This  equation  has  certainly  one  real  root,  the  other  two  roots  being  either 
both  real  or  both  complex. 

Since  a  curve  of  the  fourth  degree  may  have  two  maxima  and  one 
minimum  (or  two  minima  and  one  maximum),  the  most  direct  way  of 
finding  out  the  required  maximum  is  to  plot  the  curve  or  prepare  a  table  of 
Y  for  various  values  of  x.  Equation  (6.9)  has  value  in  giving  the  maximum 
accurately.  Also,  if  fiducial  limits  for  the  position  of  the  maximum  are 
required,  recourse  must  be  had  to  algebraic  methods.  The  fiducial  limits 
are  a  set  of  six  values,  two  for  each  of  the  three  roots  of  (6.9),  given  by  the 
sixth-degree  equation 

(b±  +  2b2x  +  3b3x2  +  4V3)2  =  t2s2[tn  +  4t12x  +  (4/22  +  6/13)z2 

+  (12r23  +  8rxV  +  (9/33  +  16r24>4 
+  24/3%5  +  16r44*6]. 
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Figure  6.4.  Fitted  curves  and  curves  of  their  derivatives,  showing  maxima  and  minima 
and  their  fiducial  limits:  (i)  one  maximum  and  two  minima,  with  four  real  and  two 
complex  limits,  (ii)  one  minimum  but  four  real  limits. 


In  discussing  the  parabolic  case,  we  pointed  out  that  the  fiducial  limits 
determined  from  such  an  equation  could  be  complex,  even  though  the 
position  of  the  maximum  is  necessarily  real.  For  higher-degree  polyno- 
mials it  is,  on  the  other  hand,  also  possible  for  the  position  of  some  of  the 
maxima  to  be  complex,  although  fiducial  limits  for  them  are  real.  This 
point  may  be  made  clear  by  examining  the  diagrams  of  Figure  6.4.  The 
curves  show  Y  and  dYjdx  plotted  against  x,  for  various  polynomials  Y. 
Curve  (i.l)  shows  a  polynomial  with  one  maximum  and  two  minima, 
whose  positions  are  given  by  the  points  where  curve  (i.2)  cuts  the  X-axis. 


116  REGRESSION  ANALYSIS 

The  broken  curves  in  (i.2)  give  the  fiducial  boundary  for  dYjdx.  Since 
the  fiducial  boundary  cuts  the  Z-axis  in  only  four  points,  we  see  that  in 
this  case  two  of  the  fiducial  limits  are  complex.  Curve  (ii.l)  shows  the 
same  polynomial  with  a  linear  term  deducted.  It  has  but  one  minimum. 
The  corresponding  curve  in  (ii.2)  differs  from  the  curve  in  (i.2)  only  by  a 
constant.  Although  this  curve  now  cuts  the  A"-axis  in  only  one  point, 
corresponding  to  the  extant  minimum,  the  fiducial  boundaries  still  cut  the 
A'-axis  in  four  points,  showing  that  there  is  still  some  evidence  for  the 
existence  of  the  maximum  and  minimum  whose  estimates  do  not  take  real 
values. 

Inspection  of  the  curves  gives  a  better  picture  of  the  situation  than  can 
be  conveyed  by  any  description;  it  is  left  to  the  reader  to  examine  the 
different  possible  cases  that  may  arise  and  to  consider  their  practical 
interpretation.  This  problem  is  an  example  of  how  the  study  of  the 
practical  issues  involved  leads  to  a  much  clearer  interpretation  than  direct 
mathematical  analysis  alone;  it  points  to  the  fact  that,  although  mathe- 
matical analysis  is  a  tool  in  the  solution  of  these  problems,  it  will  not,  of 
itself,  always  lead  to  the  answer  that  is  required. 

These  results  show  that,  although  in  practice  there  may  be  computational 
difficulties  in  the  determination  of  maxima  and  their  fiducial  limits,  in 
principle  their  determination  is  a  straightforward  application  of  the  theory 
of  estimation  outlined  in  Section  1.9  and  in  Section  6.2. 


CHAPTER     7 


The  Analysis  of  Covariance 


7.1    ANALOGY  OF  COVARIANCE  ANALYSIS  AND 
MULTIPLE  REGRESSION 

When  an  experiment  is  carried  out  to  study  the  effects  of  different 
factors  on  a  variable,  it  is  desirable  to  keep  the  values  of  all  extraneous 
factors  under  control  as  far  as  possible.  Some  factors  cannot  be  con- 
trolled, however,  or  it  may  be  impracticable  to  try  to  do  so ;  but  if  their 
values  can  be  measured,  this  additional  information  may  be  used  to 
improve  the  accuracy  of  the  results  for  the  variable  studied.  For  example, 
in  determining  the  effect  of  rate  of  loading  on  the  modulus  of  rupture  of 
timber,  specimens,  it  is  not  always  possible  to  control  temperature  and 
humidity.  If  these  variables  are  measured  at  the  time  of  test,  it  is  possible 
to  make  an  adjustment  for  their  effects.  In  many  cases  it  is  found 
sufficiently  accurate  to  make  an  empirical  linear  adjustment,  based  on 
prior  knowledge,  to  correct  the  results  to  a  chosen  temperature  and 
humidity.  If  the  correction  factor  for  the  variable  x  is  c,  the  corrected 
value  of  the  dependent  variable  y  will  be 

y  -  c(x  -  x). 

When  c  is  an  empirical  adjustment,  the  corrected  values  may  be  analyzed 
in  the  same  way  as  the  original  values  would  have  been.  In  other  cases, 
where  there  is  no  prior  knowledge,  the  effect  of  the  extraneous  variables 
may  be  eliminated  by  means  of  a  regression  adjustment. 

One  incidental  advantage  of  employing  a  regression  adjustment  for 
extraneous  variables  is  that  it  enables  the  magnitude  of  their  effects  to  be 
estimated  and  tested  for  significance.  It  will  often  be  worthwhile  to 
make  a  regression  adjustment,  even  when  the  regression  is  not  significant; 
but  a  significance  test,  taken  in  conjunction  with  other  information,  will 
enable  the  experimenter  to  decide  which  variables  are  worth  adjusting  for. 

It  can  be  seen  that  the  regression  method  of  adjustment  for  extraneous 
variables  is  equivalent,  formally,  to  the  determination  of  a  multiple 
regression  including  both  the  treatments  and  the  extraneous  variables. 
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The  adjustment  of  the  effects  of  the  treatments  is  analogous  to  the  change 
from  a  simple  to  a  partial  regression  coefficient.     Indeed,  in  multiple 
regression  the  same  result  is  obtained  by  adjusting  the  dependent  and  all 
the  controlled  variables  for  the  uncontrolled  variables,  as  would  be 
jl    obtained  from  a  regression  on  all  variables,  controlled  and  uncontrolled. 
yjllM^f  i^$us>  tne  methods  of  adjustment  given  here  could  quite  logically  have 
(slA^       been  discussed   in   Chapter   3.1     However,   when__thg   treatments   are 
v  categories  rather  than  numerical  variables,  it  is  for  practical  and  com- 

putational reasons  more  convenient  to  discuss  their  adjustment  in  terms 
of  the  analysis  of  covariance  than  in  terms  of  multiple  regression.  But 
before  proceeding  to  do  this,  we  need  to  consider  further  the  role  that 
the  independent  variables  may  play  in  the  analysis.  The  reader  may  at 
each  point  satisfy  himself  of  the  analogy  with  multiple  regression. 


7.2    ENVIRONMENTAL  AND  EXPLANATORY  VARIABLES 

In  regression  analysis  an  independent  variable  may  be  included  for  one 
of  two  reasons :  (i)  as  an  estimator  of  the  dependent  variable,  accounting 
to  a  considerable  extent  for  the  variation  in  the  latter;  and  (ii)  as  a 
correction  to  the  dependent  and  the  other  independent  variables,  so 
strengthening  the  relationship  found  between  them.  In  covariance 
analysis  the  distinction  is  an  important  one ;  although  the  mechanism  of 
the  analysis  is  the  same  for  both  types  of  variables,  the  interpretation 
is  different.  When,  as  in  the  example  just  given,  the  concomitant  variables, 
temperature  and  humidity,  are  unaffected  by  treatment,  but  are  correlated 
with  the  dependent_yariable,  so  that  they  may  be  used  to  reduce  its  error5 
we  shall  call  them environmental) variables,  since  they  represent  uncontrolled 
environmental  effects.  In  other  cases  the  concomitants  may  be  affected 
by  the  treatments  also,  and  then  the  question  arises  whether  the  effect  of 
treatment  on  the  variable  being  studied  may  be  explained  in  terms  of  its 
effect  on  the  concomitants.  The  adjustment  for  the  concomitants  is 
really  a  means  of  eliminating  from  the  dependent  variable  the  part  of  the 
treatment  effect  that  is  attributable  to  the  effect  of  the  treatments  on  the 
concomitants.     Variables  treated  in  this  way  will  be  termed  explanatory. 

Clearly,  an  explanatory  variable  may  also  be  subject  to  error  variation 
which  is  correlated  with  that  of  the  dependent  variable  and  is  therefore 
environmental.     Even  so,  its  importance  lies  not  in  its  reducing  the  error 

1  A  factor  which  is  tested  at/?  +  1  levels  leads  to  an  analysis  with/;  degrees  of  freedom 
for  the  factor.  Formally,  this  analysis  is  equivalent  to  a  regression  on  p  independent 
variables.  Indeed,  the  p  independent  comparisons  among  the  variants  of  the  factor 
may  for  many  purposes  be  conveniently  represented  as  p  pseudovariables,  suitably 
defined. 
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variance  but  in  its  affording  an  explanation  of  the  treatment  effects.  In 
practice,  an  environmental  variable  is  one  that  is  independent  of  the 
treatments,  for  instance,  a  variable  observed  before  the  experiment  is 
performed.  If  a  presumed  environmental  variable  is  found  to  be  signifi- 
cantly affected  by  treatments,  an  explanation  must  be  sought  and  the 
effect  allowed  for  in  the  interpretation  of  the  data. 


7.3    EXAMPLE  OF  AN  ENVIRONMENTAL  VARIABLE 

The  actual  calculation  and  interpretation  of  a  covariance  analysis  are 
best  made  clear  by  means  of  examples ;  the  first  is  the  adjustment  for  an 
environmental  variable. 

Example  7.1  The  Effect  of  Temperature  on  the  Maximum 
Compressive  Strength  of  Specimens  of  Hoop  Pine  (Araucaria  cunning - 
hamii),  Adjusted  for  Variations  in  Moisture  Content.  In  a  series  of 
experiments  (Sulzberger,  1953)  on  the  effect  of  temperature  on  the  strength 
properties  of  timber  species,  specimens  were  tested  over  a  range  of  temperatures 
and  at  various  moisture  contents.  The  moisture  content  of  20  per  cent  could  be 
maintained  only  in  conditions  of  high  humidity  and  was  therefore  difficult  to 
control.  For  the  specimens  tested  in  this  condition,  therefore,  the  moisture 
content  varied  considerably,  so  a  covariance  adjustment  for  the  effect  of  moisture 
content  was  made  to  these  data.  For  each  species,  material  from  ten  trees  was 
taken,  two  specimens  from  each  being  tested  at  each  temperature.  The  results 
for  the  maximum  compressive  strength  parallel  to  the  grain  for  hoop  pine  are 
set  out  in  Table  7.1. 

The  first  step  in  the  analysis  is  to  compute  the  sums  of  squares  and  products 
of  the  two  variables,  attributable  to  the  different  factors.  The  total  variation 
is  split  up  into  parts  corresponding  to  trees  and  temperatures,  leaving  a  residual 
attributable  to  experimental  error.  This  leads  to  an  analysis  such  as  is  set  out 
in  the  first  five  columns  of  Table  7.2.  The  calculations  are  described  in  standard 
textbooks  dealing  with  the  analysis  of  variance.  The  sums  of  squares  between 
temperatures,  for  instance,  are  derived  from  the  totals  for  each  temperature, 
that  for  y  being 

151,3502  +  135,4502  +  105,0502  +  84,1902  +  65,1702      541, 2102 
20  100 

=  252,123,000. 

The  other  sums  of  squares  and  products  are  calculated  in  a  similar  manner. 
The  "residual"  line  of  Table  7.2  gives  sums  of  squares  and  products  from 
which  the  effects  of  temperature  and  of  variation  between  trees  have  been 
eliminated.  The  regression  coefficient  of  maximum  compressive  strength  on 
moisture  content  is  calculated  from  these  sums  of  squares  and  products : 

b  =  -6323.1/35.1846 
=  -  179.7. 
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TABLE  7.1 

Maximum  Compressive  Strength  Parallel  to  Grain  (y)  and 

Moisture  Content  (x,  Nominally  20  Per  Cent)  of  Hoop  Pine 

Specimens,  at  Different  Temperatures 

Totals  of  Pairs  from  Same  Tree 


Temperature,  °C. 

-20 

0 

20 

40 

60 

Total 

Tree 

y       x 

y       x 

y       x 

y       x 

y      x 

y         x 

i 

13,140 

12,460 

9,430 

7,630 

6,340 

49,000 

42.1 

41.1 

43.1 

41.4 

39.1 

206.8 

2 

15,900 

14,110 

11,300 

9,560 

7,270 

58,140 

41.0 

39.4 

40.3 

38.6 

36.7 

196.0 

3 

13,390 

12,320 

9,650 

7,900 

6,410 

49,670 

41.1 

40.2 

40.6 

41.7 

39.7 

203.3 

4 

15,510 

13,680 

10,330 

8,270 

7,060 

54,850 

40.1 

39.8 

40.4 

39.8 

39.3 

199.4 

5 

15,530 

13,160 

10,290 

8,670 

6,680 

54,330 

41.0 

41.2 

39.7 

39.0 

39.0 

199.9 

6 

15,260 

13,640 

10,350 

8,670 

6,620 

54,540 

42.0 

40.0 

40.3 

40.9 

41.2 

204.4 

7 

15,060 

13,250 

10,560 

8,100 

6,150 

53,120 

40.4 

39.0 

34.9 

40.1 

41.4 

195.8 

8 

15,210 

13,540 

10,460 

8,300 

6,090 

53,600 

39.3 

38.8 

37.5 

40.6 

41.8 

198.0 

9 

16,900 

15,230 

11,940 

9,340 

6,260 

59,670 

39.2 

38.5 

38.5 

39.4 

41.7 

197.3 

10 

15,450 

14,060 

10,740 

7,750 

6,290 

54,290 

37.7 

35.7 

36.7 

38.9 

38.2 

187.2 

Total 

151,350 

135,450 

105,050 

84,190 

65,170 

541,210 

403.9 

393.7 

392.0 

400.4 

398.1 

1,988.1 

This  coefficient  is  the  appropriate  correction  to  be  applied  to  adjust  for  random 
variations  in  moisture  content.  The  mean  values  of  maximum  compressive 
strength  and  moisture  content  for  each  temperature,  and  the  means  for  the 
former  corrected  to  20  per  cent  moisture  content,  are  set  out  in  Table  7.3. 

The  co variance  adjustment  for  the  independent  variable  may  be  applied  to  the 
treatment  means  even  when  its  effect  is  not  significant,  just  as  effects  of  variation 
between  trees  or  those  caused  by  other  external  factors  would  be  eliminated 
whether  or  not  they  were  significant.  However,  the  improvement  due  to 
~ovariance  is  not  likely  to  be  large  unless  the /regression  is  significant.  The 
analysis  of  ths  residual  variation  in  Table  7.4  shows  that  the  regression  is  in  this 
case  highly  significant. 
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TABLE  7.2 

Analysis  of  Covariance  of  Maximum  Compressive  Strength  (y) 

and  Moisture  Content  (x) 

Sums  of  Squares  and  Products  y,  Adjusted  for  x 


D.F. 

y2 

xy 

a;2 

Trees 

Temperatures 
Linear 
Deviation 

9 

1 

3 

9,503,300 

250,029,500 
2,093,500 

-7,610.1 

5,478.7 
-755.5 

27.1469 

0.1200 

4.5974 

Total 
Residual 

4 
36 

252,123,000 
4,187,400 

4,723.2 
-6,323.1 

4.7174 
35.1846 

D.F.        S.S.  M.S. 


3      1,970,300    656,800* 

35     3,051,100     87,170 
Total  49      265,813,700      -9,210.0     67.0489 

*  *  Significant  at  1  per  cent  level 

TABLE  7.3 

Original  and  Corrected  Means  of  Maximum  Compressive 

Strength 

Temperature,  °C.  y  x  y  —  b(x  —  20) 


-20 

7568 

20.20 

7603 

0 

6772 

19.68 

6716 

20 

5252 

19.60 

5181 

40 

4210 

20.02 

4213 

60 

3258 

19.90 

3241 

TABLE  7.4 

Test  of  Residual  Regression  of  Maximum  Compressive 

Strength  on  Moisture  Content 

D.F.  Sum  of  Squares  Mean  Square 

Regression  1  1,136,300  1,136,300** 

Residual,  reduced  35  3,051,100  87,170 


Residual  36  4,187,400  116,320 

**  Significant  at  1  per  cent  level 


The  regression  sum  of  squares  is  determined  as 

(-6323.1)2/35.1846  =  1,136,300. 

The  residual  mean  square  is  reduced  from  116,320  to  87,170,  giving  an  increase 
in  efficiency  of  about  33  per  cent.  This  means  that  the  use  of  covariance  has, 
improved  the  accuracy  to  about  the  same  extent  as  increasing  tfre  number  of 
trees  from  10  to  13. 
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Incidentally,  the  analysis  of  variance  of  the  moisture  contents  furnishes  no 
evidence  that  moisture  content  has  been  affected  by  temperature;  the  relevant 
mean  squares  are 

Temperatures  1.1794 

Residual  0.9774. 

There  is  therefore  no  reason  why  moisture  content  should  not  be  employed  as  an 
environmental  variable. 


7.4    SIGNIFICANCE  OF  TREATMENT  EFFECTS  AFTER 
ADJUSTMENT 

There  remains  now  the  question  of  testing  the  significance  of  the  effects 
of  temperature  after  the  adjustment  for  moisture  content  has  been  made. 
Since  in  this  work  it  was  expected  that  the  regression  of  compressive 
strength  on  temperature  would  be  linear,  it  is  first  necessary  to  test  the 
significance  of  the  deviation  from  linearity.  For  this  reason  the  sums  of 
squares  and  products  for  temperature  in  Table  7.2  have  been  partitioned 
into  components  for  linear  trend  and  deviation  from  linearity. 

Since  a  common  regression  coefficient  has  been  used  in  correcting  the 
means  shown  in  Table  7.3,  these  means  are  correlated.  These  correlations 
need  to  be  allowed  for  in  testing  the  significance  of  treatment  comparisons. 
Just  as,  in  multiple  regression,  the  adjustment  of  the  dependent  variable 
for  one  of  the  independent  variables  affects  the  relations  between  it  and  the 
remaining  independent  variables,  so  in  covariance  analysis  the  adjustment 
affects  not  only  the  residual  sum  of  squares  but  also  the  sums  of  squares 
for  treatment  comparisons.  Treatment  comparisons  here  take  the  role 
of  the  remaining  independent  variables. 

However,  the  significance  of  these  comparisons  may  be  tested  by  a 
simple  device.  Suppose  that  we  wish  to  test  the  significance  of  deviation 
of  the  corrected  means  from  a  linear  relation  with  temperature.  The 
null  hypothesis  implies  that  the  sum  of  squares  for  deviation  from 
linearity  and  the  residual  sum  of  squares  are  homogeneous  apart  from 
any  effects  of  moisture  content.  Accordingly,  these  two  lines  of  the 
analysis  may  be  pooled,  and  a  correction  derived,  based  on  the  pooled 
sums  of  squares  and  products.  The  analysis  is  shown  in  Table  7.5.  Here 
the  sum  of  squares  for  regression  is 

(-7078.6)2/39.7820  =  1,259,500. 

The  reduced  pooled  sum  of  squares,  with  38  degrees  of  freedom,  is  by 
hypothesis  comparable  with  the  reduced  residual  sum  of  squares  from 
Table  7.4,  which  it  includes.  The  difference  between  these  two  represents 
deviation  from  linearity,  adjusted  for  the  effects  of  moisture  content. 
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TABLE  7.5 

Test  of  Deviation  from  Linearity,  Adjusted  for  Moisture 

Content 

Sums  of  Squares  and  Products 


Deviation  plus  residual 
Regression 

D.F. 

39 
1 

38 
35 

3 

6,2*0,900 
1,259,500 

xy 
-7,078.6 

X2 

39.7820 

Mean 
Square 

Deviation  plus  residual, 

reduced 
Residual,  reduced 

5,021,400 
3,051,100 

87,170 

Deviation,  adjusted 

1,970,300 

656,800* 

**  Significant  at  1  per  cent  level 

The  analysis  shows  that  deviation  from  linearity  is  still  highly  significant, 
although  the  sum  of  squares  is  reduced  from  2,093,500  to  1,970,300  by 
the  adjustment.  The  most  direct  way  of  calculating  the  adjusted  sum  of 
squares  is  to  add  to  the  original  sum  of  squares  for  deviations  the  residual 
regression  sum  of  squares  and  deduct  the  pooled  regression  sum  of  squares ; 
thus, 

2,093,500  +  1,136,300  -  1,259,500  =  1,970,300. 

The  analysis  of  adjusted  values  is  presented  in  the  final  three  columns  of 
Table  7.2. 

Since  the  adjustment  to  the  sum  of  squares  for  deviation  consists  of  the 
difference  of  two  regression  sums  of  squares,  it  may  be  either  positive  or 
negative.  For  this  reason,  although  we  speak  of  a  reduced  sum  of  squares 
when  the  associated  degrees  of  freedom  are  reduced  by  the  number  of 
covariance  variables  eliminated,  we  speak  of  an  adjusted  sum  of  squares 
when  the  degrees  of  freedom  are  unaltered. 

A  further  analysis  of  the  treatment  comparisons  is  often  of  interest. 
Considering  still  the  deviation  from  linearity,  we  may  derive  a  reduced  sum 
of  squares,  with  two  degrees  of  freedom,  from  a  regression  based  on  the 
deviation  line  of  Table  7.2.  The  difference  between  the  adjusted  and  the 
reduced  sums  of  squares  is  a  sum  of  squares  representing  the  difference  of 
regressions  from  the  deviation  and  the  residual  lines  of  the  analysis ;  this 
difference  of  regressions  may  also  be  interpreted  as  the  regression  of  the 
temperatures  component  of  y  on  x  (Williams,  1954).  In  some  situations, 
although  not  in  the  present  one,  it  is  of  interest  to  examine  separately  the 
magnitude  of  these  two  effects.    The  analysis  of  Table  7.6  confirms  what 
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TABLE  7.6 
Partition  of  the  Adjusted  Sum  of  Squares  for  Deviations 

D.F.  Sum  of  Squares  Mean  Square 

Difference  of  regressions  1  900  900(n) 

Deviation,  reduced  2  1,969,400  984,700** 


Deviation,  adjusted  3  1,970,300 


in)  Not  significant 

**  Significant  at  1  per  cent  level 


might  have  been  judged  from  Table  7.2,  that  the  regression  coefficients 
from  the  deviation  and  residual  lines  (—164.3  and  — 179.7  respectively)  do 
not  differ  significantly. 

7.5    STANDARD  ERRORS  OF  ADJUSTED  MEANS 

As  shown  in  the  preceding  section,  the  significance  of  any  comparison 
among  the  adjusted  means  may  be  tested  by  an  analysis  of  variance. 
However,  it  is  convenient  to  quote  the  standard  error  of  the  comparison. 
Suppose  two  means,  y  and  y',  of  n  and  n'  values  respectively,  are  being 
compared,  and  that  x  and  x  are  the  corresponding  values  of  the  inde- 
pendent variable.  If  b  is  the  regression  coefficient  from  the  residual  line 
of  the  analysis,  the  adjusted  difference  is 

y  —  y'  —  b(x  —  x') 
and  its  estimated  variance 


1        1        (x  -  x'f 
+  _  + 


!n 


(7.1) 


in  this  expression  s2  is  the  reduced  residual  variance  ofy  and  tn  the  residual 
sum  of  squares  of  x.  The  same  method  is  adopted  in  determining  the 
variance  of  more  complicated  comparisons. 

The  variance  of  the  adjusted  comparison  differs  from  that  of  the 
unadjusted  in  two  respects:  the  residual  variance  of  y  is  reduced  by 
regression  on  x,  and  the  factor 


is  increased  to 
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to  allow  for  errors  in  the  regression  coefficient.  Finney  (1946),  noticing 
that  the  increase  due  to  the  second  cause,  averaged  over  all  comparisons, 
was 

m 

where  m  is  the  treatments  mean  square  for  x,  has  suggested  combining 
both  these  factors  in  one  effective  reduced  residual  mean  square,  namely 


(■♦=)• 


For  many  purposes  it  will  be  sufficiently  accurate  to  use  the  effective 
variance  thus  defined,  rather  than  to  use  a  different  expression  such  as 
(7.1)  for  each  comparison. 

Since  the  variance  of  each  comparison  is  affected  by  the  errors  in  the 
regression  coefficient,  it  seems  appropriate  to  base  any  estimate  of  the 
efficiency  of  the  use  of  covariance  on  the  effective  variance.  Thus,  in 
Example  7.1,  the  effective  reduced  residual  mean  square  is 


/  1.1794  \ 

I    +  35.1846/ 


87,170(1  + 

=  90,090, 
and  the  efficiency  of  the  covariance  analysis  is  more  correctly  estimated  as 

116,320 


90,090 
or  an  increase  of  29  per  cent. 


=  1.29, 


7.6    EXAMPLE  OF  AN  EXPLANATORY  VARIABLE 

Example  7.2  The  Effect  of  Loss  in  Weight  Due  to  Fungal  Decay  on 
the  Impact  Strength  of  Specimens  of  Coachwood  (Ceratopetalum 
apetalum).  In  a  study  of  the  effects  of  fungal  decay  on  the  mechanical 
strength  of  timber,  specimens  of  coachwood  were  exposed  for  varying  periods 
to  fungal  decay.  The  specimens  were  weighed  before  and  after  decay  and  then 
subjected  to  the  Izod  impact  test.  In  Table  7.7  are  shown  for  each  specimen 
the  percentage  loss  in  weight  (x)  and  the  corresponding  Izod  value  {y).  It  was 
hoped  that  the  Izod  value  could  be  satisfactorily  estimated  from  the  loss  in 
weight.  This  would  mean  that  the  loss  in  weight  would  also  account  for  the 
effect  of  duration  of  exposure  to  fungal  decay  on  the  Izod  value  of  the  specimens. 

The  analysis  of  variance  and  covariance  is  set  out  in  Table  7.8.  Inspection 
of  the  table  shows  that,  as  might  be  expected,  the  effect  of  treatment  on  weight 
loss  is  highly  significant  (mean  squares  for  treatments  and  residual,  895.1  and 
25.51  respectively;  F  =  35.1);  hence  weight  loss  is  a  possible  explanatory 
variable. 
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TABLE  7.7 

Izod  Values  (y)  and  Percentage  Loss  in  Weight  (x)  of  42 

Coachwood  Specimens  after  Exposure  for  Varying  Times  to 

Fungal  Decay 


26  Days 

35  Days 

49  Days 

63  Days 

y 

X 

y 

X 

y 

X 

y 

X 

30 

8.0 

34 

9.1 

27 

15.0 

16 

23.4 

28 

10.5 

19 

19.4 

20 

21.0 

16 

28.2 

30 

8.2 

22 

20.5 

14 

24.2 

20 

29.5 

29 

13.0 

25 

14.2 

18 

15.3 

16 

22.2 

28 

10.1 

22 

11.0 

12 

17.3 

6 

40.2 

17 

16.2 

25 

19.1 

18 

24.7 

16 

36.6 

24 

16.1 

22 

16.0 

13 

23.9 

12 

35.5 

22 

13.4 

23 

11.1 

21 

20.2 

11 

42.1 

30 

13.0 

28 

12.0 

24 

13.6 

9 

42.5 

30 

7.3 

24 

17.1 

22 

25.8 

25 

14.0 

30 

24 

13.3 
8.9 

293 

129.8 

298 

171.7 

167 

175.2 

144 

326.0 

No.       11 

1 

2 

9 

10 

TABLE  7.8 

Analysis  of  Covariance  of  Izod  Value  and  Percentage 

Loss  in  Weight 

Sums  of  Squares  and  Products 


D.F. 

y2 

xy 

z2 

Treatments 

3 

1005.64 

-1572.32 

2685.41 

Residual 

38 
41 

782.84 

-551.92 

969.41 

Total 

1788.48 

-2124.24 

3654.82 
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In  this  example  the  adjusted  Izod  means  are  of  only  incidental  interest.     We 
go  directly  to  the  adjusted  treatment  sum  of  squares,  given  in  Table  7.9.     The 

TABLE  7.9 

Test  of  Efficacy  of  Loss  in  Weight  to  Account  for 

Treatment  Effects 

D.F.  Sum  of  Squares     Mean  Square 

Difference  of  regressions  1  0.19 

Treatments,  reduced  2  85.04 


Treatments,  adjusted  3  85.23  28.41 (n) 

Residual,  reduced  37  468.61  12.67 


Total,  reduced  40  553.84 

{n)  Not  significant 

analysis  shows  that  the  adjusted  mean  square  is  not  significant,  so  that  weight 
loss  has  satisfactorily  accounted  for  the  variation  in  Izod  value.  Had  the  mean 
square  been  significant,  we  could  have  further  analyzed  it  into  terms  for  difference 
of  regression  and  reduced  variation  between  treatments.  The  purpose  of  such 
an  analysis  would  be  to  see  whether  the  significant  adjusted  treatment  effect 
could  be  attributed  to  a  difference  of  regressions.  If  such  a  difference  of 
regressions  exists,  it  may  possibly  be  interpreted  in  terms  of  other  factors 
associated  with  the  loss  in  weight  (for  example,  deterioration  of  the  remaining 
wood  substance),  but  varying  significantly  from  one  treatment  to  another.  In 
the  present  instance,  however,  as  is  shown  in  Table  7.9,  there  is  no  significant 
difference  of  regressions  even  if  it  were  appropriate  to  carry  out  the  detailed 
analysis. 

7.7    FURTHER  APPLICATIONS  OF  COVARIANCE 
ANALYSIS 

Covariance  analysis  may  readily  be  applied  to  the  adjustment  of  data 
for  two  or  more  independent  variables.  For  each  line  of  the  analysis — 
treatments,  residual  and  total — the  reduced  sums  of  squares  of  the 
dependent  variable  are  determined.  The  analysis  is  based  on  these 
reduced  sums  of  squares.  If  there  are/?  independent  variables,  q  degrees 
of  freedom  for  treatments,  and  n  for  total,  the  resulting  comparisons  have 
the  following  degrees  of  freedom : 

Difference  of  regressions  p 

Treatments,  reduced  q  —  p 

Residual,  reduced  n  —  p  —  q 

Total,  reduced  n  —  p 
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In  such  an  analysis,  some  of  the  variables  may  be  environmental  and 
some  explanatory.  The  interpretation  of  the  results  of  the  analysis  must 
take  these  possibilities  into  account. 

The  use  of  covariance  analysis  to  eliminate  a  causal  or  explanatory 
variable  has  many  applications  in  statistical  work.  The  method  is 
particularly  useful  in  testing  hypotheses.  For  instance,  the  hypothesis  to 
be  tested  may  be  that  the  variation  of  a  number  of  variables  between 
groups  is  attributable  to  the  variation  of  one  of  them  or  of  some  compound 
of  them.  This  question  arises  in  discriminant  analysis,  where  the  linear 
compound  of  a  set  of  variables  that  best  distinguishes  between  a  number 
of  groups  is  sought.  The  linear  compound  specified  by  the  null  hypothesis 
is  then  the  "explanatory  variable";  to  test  its  adequacy  as  a  discriminant 
function  we  perform  an  analysis  of  covariance  and  test  the  significance  of 
the  reduced  variation  in  the  remaining  variables.  In  the  same  way,  the 
adequacy  of  two  or  more  linear  compounds  to  account  for  the  variation 
of  a  set  of  variables  may  be  tested,  by  eliminating  their  joint  effects  by 
covariance.  These  questions  are  discussed  fully  in  Chapter  10,  where  it  is 
shown  that  the  practically  useful  tests  of  significance  in  multivariate 
analysis  are  generalizations  of  the  analysis  of  covariance. 


CHAPTER    8 


The  Treatment 

of  Heterogeneous  Data 


8.1     GENERAL 

In  previous  chapters  we  have  been  concerned  mainly  with  the  analysis 
of  homogeneous  data,  that  is,  data  that  may  be  assumed  to  have  been 
drawn  from  a  single  population.  Only  in  Chapter  7,  in  discussing  the 
analysis  of  covariance,  have  we  considered  data  that  were  divided  into 
groups  according  to  treatment  classifications.  However,  whether  the 
object  was  to  make  an  adjustment  to  the  dependent  variable  for  the  effects 
of  the  independent  variables,  or  to  find  an  interpretation  of  the  treatment 
effect  in  terms  of  the  independent  variables,  the  purpose  of  the  analysis  was 
to  eliminate  extraneous  sources  of  variation  in  determining  the  relationship. 
The  relationship  between  the  variables  was  assumed  to  be  the  same  for  each 
treatment. 

When  the  data  are  not  homogeneous,  their  interpretation  can  become 
complicated.  This  chapter  will  deal  with  a  number  of  different  problems 
which  arise  with  heterogeneous  data.  Clearly,  some  such  problems 
would  be  so  special,  and  their  analysis  so  complicated,  as  to  be  outside  the 
scope  of  this  book.     Hence  only  a  selection  can  be  given  here. 

8.2    COMPARISON  OF  REGRESSION  EQUATIONS 
FROM  INDEPENDENT  SETS  OF  DATA 

When  measurements  have  been  made  of  several  variables  in  a  number 
of  different  sets  of  data,  the  question  arises  whether  the  same  regression 
relationship  will  apply  to  each  set.  Even  if  the  regression  coefficients  are 
equal  for  each  set,  the  constant  terms  may  differ,  so  that  the  regression 
lines  will  be  parallel  rather  than  coincident.  For  example,  in  biological 
assay,  when  a  compound  of  unknown  strength  is  being  tested  against  a 
known  standard,  varying  doses  of  each  are  applied,  and  the  organism's 
response  is  measured.  For  each  compound,  the  regression  of  response  on 
dosage  is  determined ;    if  a  pair  of  parallel  regression  lines  fits  the  data 
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satisfactorily,  the  distance  between  the  lines  (measured  parallel  to  the 
dosage  axis)  gives  the  relative  potency  of  the  two  compounds. 

If  the  regression  equations  for  the  different  sets  are  identical,  even  to  the 
constant  term,  the  sets  may  be  regarded  as  equivalent  with  respect  to  the 
dependent  variable,  any  differences  between  the  groups  being  attributable 
to  differences  in  the  values  of  the  independent  variables.  This  case  was 
considered  in  Chapter  7  in  the  discussion  of  explanatory  variables. 

The  significance  of  the  difference  among  the  regression  coefficients  for 
different  sets  may  readily  be  tested,  by  means  of  the  analysis  of  variance. 
On  the  hypothesis  that  the  regression  coefficients  in  the  populations  are  the 
same,  the  common  set  of  coefficients  may  be  estimated  from  the  combined 
sums  of  squares  and  products  within  the  sets,  and  the  regression  sum  of 
squares  determined  from  the  combined  data.  This  sum  of  squares  would 
be  identical  with  the  sum  of  the  regression  sums  of  squares  from  each  set, 
if  the  regression  coefficients  were  in  fact  the  same  for  each  set;  hence,  the 
difference  between  the  sum  of  the  regression  sums  of  squares  for  each  set 
and  the  combined  regression  sum  of  squares  gives  a  criterion  appropriate 
for  an  over-all  test  of  differences  among  the  coefficients.  This  is  what  is 
generally  required.  If,  however,  it  is  considered  to  be  of  interest  to 
compare  individual  coefficients  in  the  different  sets,  the  significance  of 
their  deviations  from  their  weighted  mean  may  be  determined. 

To  carry  out  these  tests  we  shall  require  (i)  the  combined  sums  of  squares 
and  products  for  the  different  groups,  which  we  define  as  the  sums  of  the 
corresponding  sums  of  squares  and  products  within  groups ;  and  (ii)  the 
over-all  sums  of  squares  and  products,  which  we  define  as  the  total  sums  of 
squares  and  products  over  all  the  groups,  regardless  of  group  differences. 
Those  familiar  with  the  analysis  of  variance  will  recognize  that  the  over-all 
sums  of  squares  and  products  exceed  the  corresponding  combined  elements 
by  the  sums  of  squares  and  products  between  groups. 

Suppose  that  there  are  m  sets  of  data,  with  p  independent  variables 
xl9  x2,  -  -  ',  xp.    We  use  the  following  notation: 

nr    number  in  rth  group 

n     total  number 

ur    sum  of  squares  of  y  in  the  rth  set 

pri  sum  of  products  with  xi  in  the  rth  set 

trhi  sum  of  products  of  xh  and  xi  in  the  rth  set 

bri  partial  regression  coefficient  on  xt  in  the  rth  set 

tchi  combined  sum  of  products  of  xh  and  xi  and,  similarly,  other  quan- 
tities in  the  combined  analysis  will  be  indicated  by  the  subscript  c ; 

tohi  over-all  sum  of  products  of  xh  and  x{  and,  similarly,  other  quantities 
in  the  over-all  analysis  will  be  indicated  by  the  subscript  o. 
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(i)  Test  of  Differences  among  Regression  Coefficients  (Parallelism) 

The  sum  of  squares  for  regression  for  the  rth  set  is 

^bripri,        with  p  degrees  of  freedom, 

where  bri  =  J,prhtrhi. 

h 

In  the  same  way,  the  combined  regression  coefficients  are 

bci  =  2pJ,M9 

h 

so  that  the  sum  of  squares  for  the  combined  regression,  also  with/?  degrees 
of  freedom,  is 

2  bciPci- 

i 

Hence  the  sum  of  squares  for  testing  difference  of  regressions  is 

YlbriPri-l,bcipci,  (8.1) 

r    i  i 

with  (m  —  \)p  degrees  of  freedom. 

Since  pci  =  %pri, 

r 

the  sum  of  squares  (8.1)  may  be  written  in  the  form 

22  (bri  -  bci)pri, 

r    i 

in  which  its  dependence  on  the  differences  of  the  regression  coefficients  in 
the  different  groups  is  clearly  shown. 

The  analysis  of  variance  for  testing  difference  of  regressions  is  shown  in 
Table  8.1. 

TABLE  8.1 

D.F,  Sum  of  Squares 

Combined  regression  p  2  bcipci 

i 

Difference  of  regressions  (m  -  \)p  22  ^«  -  2  baPa 

r    i  i 

Combined  residual  n  —  mp  —  m  uc  —  22  bTipri 


Total  within  groups  n  —  m 


It  will  seldom  be  required  to  make  more  detailed  tests  of  differences 
among  regression  coefficients.  However,  it  may  sometimes  be  of  interest 
to  determine  whether  the  set  of  regression  coefficients  on  one  of  the 
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independent  variables  contributes  significantly  to  the  heterogeneity;  this 
set  may  then  be  tested  separately.  For  the  comparison  of  the  regression 
coefficients  in  different  groups  for  one  of  the  independent  variables,  we 
calculate  the  sum  of  squares  of  deviations  of  these  coefficients  from  their 
weighted  mean. 
The  weighted  mean  of  the  coefficients  bri  is 

2brJtr" 


r 

t   ' 

the 

sum 

of 

squares 

of  deviations  is 

I(bri-bff 

r 

=  2(br?ltru)-bl< 

It" 

20/', 

0 

with  m  —  1  degrees  of  freedom.    This  sum  of  squares  may  be  tested 
against  the  combined  residual  mean  square. 

(ii)  Test  of  Differences  of  Position  (Coincidence) 

If  the  difference  of  regressions  is  not  significant,  it  may  be  assumed  that 
the  regression  lines  for  the  different  sets  are  parallel,  the  combined 
regression  coefficients  hci  being  applicable.  The  question  then  arises 
whether  the  differences  in  the  position  of  these  parallel  lines  are  significant. 
To  test  this,  a  single  line  is  fitted  to  all  the  data,  regardless  of  group 
differences.    The  overall  regression  coefficients  are 

h 

giving  an  over-all  regression  sum  of  squares  of 

2  boiPoi- 

i 

The  over-all  sum  of  squares  of  y  has  n  —  1  degrees  of  freedom,  an  increase 
of  m  —  1  degrees  of  freedom  over  the  total  within  groups,  this  increase 
representing  the  variation  between  groups.  To  derive  the  adjusted  sum  of 
squares  between  groups,  we  add  to  the  sum  of  squares  between  groups  the 
combined  regression  sum  of  squares  and  deduct  the  over-all  regression 
sum  of  squares.  As  might  be  expected,  this  is  the  same  as  the  derivation 
of  the  adjusted  sum  of  squares  in  covariance  analysis,  since  a  test  of 
difference  of  positions  is  in  fact  a  test  of  the  means  adjusted  for  the 
independent  variables. 
The  analysis  of  variance,  in  which  tests  of  difference  of  regression  and 
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of  position  are  combined,  may  be  set  out  as  in  Table  8.2.  In  setting  out 
the  tests  in  this  way  we  should  not  test  difference  of  positions  unless 
difference  of  regressions  is  in  fact  not  significant. 


Sum  of  Squares 

2  KiPoi 

i 

uo   ~  uc  ~  2  boiPoi   +   2  bciPci 

%  i 

YLKiPri  -  XbciPci 
r     i  i 

Uc   ~  22  briPri 


TABLE  8.2 

D.F. 

Over-all  regression 

P 

Difference  of  positions 

m  —  1 

Difference  of  regressions 

(m  -  \)p 

Combined  residual 

n  —  mp  —  m 

Total 


8.3    AN  EXAMPLE 

Example  8.1  Biological  Assay  of  Oestrogenic  Activity  of  Green 
Clovers.  Clover  (Trifolium  subterraneum  L.)  contains  a  material  which  acts 
as  an  oestrogen.  In  order  to  assess  the  oestrogenic  activity  of  the  clover,  and 
the  variation  in  activity  in  different  strains,  Alexander  and  Watson  (1951)  took 
samples  of  clover  of  various  strains,  dried  them,  and  fed  them  at  various  levels 
of  daily  dose  to  spayed  female  guinea  pigs.  The  animals  were  killed  after  two 
days  of  the  feeding  treatments  and  the  uterine  weights  determined.  The 
uterine  weight  was  taken  as  a  measure  of  the  oestrogenic  activity  of  the  clovers. 
For  the  present  analysis  we  use  the  results  for  standard  clover  (a  homogeneous 
sample  of  dehydrated  clover  in  tablet  form)  and  for  clover  of  the  Bacchus  Marsh 
and  Dwalganup  strains. 

If  relative  potencies  of  different  materials  are  to  be  determined  (in  terms  of  the 
ratios  of  dose  levels  of  different  materials  giving  the  same  response),  it  is  con- 
venient to  express  the  effect  of  each  as  a  regression  of  response  on  the  logarithm 
of  dose  (designated  "dosage"  by  Finney).  In  this  experiment  it  was  found  that 
the  regression  on  dosage  was  approximately  linear,  and  the  residual  variances 
were  homogeneous,  if  log  uterine  weight  was  taken  as  the  independent  variable. 
Table  8.3  sets  out  the  results  for  log  daily  dose  and  log  uterine  weight  for  each 
animal,  together  with  log  body  weight  at  the  beginning  of  the  experiment, 
which  was  included  in  the  analysis  to  provide  a  correction  for  the  initial  variation 
among  the  animals. 

To  test  the  homogeneity  of  the  regressions,  it  is  necessary  to  determine  the 
regression  equations  for  each  strain  and  the  combined  regression  coefficients. 
In  Table  8.4  the  sums  of  squares  and  products  necessary  for  these  calculations 
are  given.  In  making  these  calculations,  it  is  desirable  to  record  for  each  strain 
the  crude  sums  of  squares  and  products,  which  will  be  required  for  the  calcula- 
tion of  the  over-all  sums  of  squares  and  products.     We  could  include  the 
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TABLE  8.3 

Results  of  Experiment  on  Effect  of  Dose  of  Clover  on 

Uterine  Weight 

xi  =  logio  (dailY  dose,  g) 

x2  =  l°gio  (initial  body  weight,  g) 

V  =  l°gio  (100  x  uterine  weight,  g) 


s 

tandard 

Bacchus  Marsh 

Dwalganup 

xx 

*2 

y 

xx 

x2 

y 

xx 

x2 

y 

1.167 

2.690 

1.878 

1.375 

2.708 

1.834 

1.498 

2.699 

1.882 

1.167 

2.665 

1.880 

1.375 

2.681 

1.772 

1.498 

2.690 

1.925 

1.167 

2.663 

1.941 

1.375 

2.672 

1.718 

1.498 

2.672 

1.778 

1.167 

2.643 

1.898 

1.375 

2.655 

1.786 

1.498 

2.653 

1.843 

1.167 

2.613 

1.823 

1.375 

2.643 

1.766 

1.498 

2.643 

1.955 

1.375 

2.633 

1.760 

1.498 

2.556 

1.713 

1.025 

2.724 

1.830 

1.236 

2.699 

1.731 

1.350 

2.708 

1.873 

1.025 

2.672 

1.916 

1.236 

2.677 

1.682 

1.350 

2.681 

1.843 

1.025 

2.659 

1.849 

1.236 

2.672 

1.719 

1.350 

2.663 

1.801 

1.025 

2.643 

1.873 

1.236 

2.663 

1.713 

1.350 

2.653 

1.816 

1.025 

2.633 

1.737 

1.236 

2.653 

1.770 

1.350 

2.653 

1.883 

1.236 

2.655 

1.665 

1.350 

2.643 

1.818 

0.886 

2.708 

1.827 

1.093 

2.699 

1.708 

1.197 

2.699 

1.854 

0.886 

2.681 

1.865 

1.093 

2.681 

1.639 

1.197 

2.681 

1.772 

0.886 

2.663 

.  1.757 

1.093 

2.663 

1.747 

1.197 

2.663 

1.810 

0.886 

2.643 

1.785 

1.093 

2.663' 

1.622 

1.197 

2.659 

1.744 

0.886 

2.623 

1.753 

1.093 

2.643 

1.719 

1.197 

2.653 

1.732 

1.093 

2.633 

1.665 

1.197 

2.643 

1.751 

0.740 

2.633 

1.713 

0.740 

2.699 

1.727 

0.740 

2.690 

1.737 

0.740 

2.672 

1.740 

0.740 

2.653 

1.725 

Total 

19.090         53.270     36.254     22.224     47.993     31.016     24.270     47.912     32.793 
Mean 

0.954  2.664       1.813       1.235       2.666       1.723       1.348       2.662       1.822 


over-all  analysis  too  at  this  stage,  but  we  prefer  to  bring  it  in  after  the  homo- 
geneity of  the  regressions  has  been  tested.  The  four  inverse  matrices  and 
four  pairs  of  regression  coefficients  (one  for  each  strain,  one  for  the  combined 
analysis)  are  set  out  in  Table  8.5,  together  with  the  regression  sum  of  squares 
for  each.  Although  the  regression  coefficients  appear  rather  variable,  such 
variation  is  possibly  due  to  sampling  error.  The  analysis  given  in  Table  8.6 
tests  this  point. 
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TABLE  8.6 
Analysis  of  Variance  to  Test  Homogeneity  of  Regressions 


Combined  regression 
Difference  of  regressions 
Combined  residual 

D.F. 

2 

4 

47 

53 

Sum  of  Squares 

0.130  325 
0.007  067 
0.090  374 

Mean  Square 

0.001  767("> 
0.001  923 

Total  within  groups 
m)  Not  significant 

0.227  766 

We  see  from  Table  8.6  that  the  difference  of  regressions  is  not  significant,  so 
that  it  is  valid  to  use  the  combined  regression.  We  can  also  test  the  effect  of 
dose  and  the  efficacy  of  the  adjustment  for  initial  body  weight.  The  standard 
errors  of  b±  and  b2  are 

S.E.  (hx)  =  V(0.001  923  x  1.000  857)  =  0.044 
S.E.  (62)  =  V(0.001  923  x  23.3046)     =  0.21. 

Both  regression  coefficients  are  highly  significant,  so  that  the  adjustment  for 
body  weight  has  some  effect  on  the  relation  of  uterine  weight  and  dose. 

To  test  the  difference  of  positions  of  the  lines,  these  differences  representing 
differences  in  relative  potency,  we  require  the  over-all  analysis  of  regression 
effects.  The  total  sums  of  squares  and  products  are  shown  in  Table  8.7,  and 
the  inverse  matrix,  over-all  regression  coefficients,  and  sum  of  squares  in  Table 
8.8.     The  analysis  to  test  the  difference  of  positions  is  shown  in  Table  8.9.    This 


TABLE  8.7 
Total  Sums  of  Squares  and  Products 


2.591 015         -0.028  235  0.248  958 

-0.028  235  0.043  758 J         0.022  135 

0.336  288 


TABLE  8.8 

Inverse  Matrix  and  Over-all  Regression  Coefficients  and 

Regression  Sum  of  Squares 


0.388  682  0.250  80 

0.250  80  23.014  8 


bi 

0.102  32 
0.571  87 


Regression  sum  of  squares  =  0.038  132 
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TABLE  8.9 

Analysis  of  Variance  to  Test  Homogeneity  of  Positions  and 

Regressions 

D.F.  Sum  of  Squares  Mean  Square 

Over-all  regression  2  0.038  132 

Difference  of  positions  2  0.200  715  0.100  4** 

Difference  of  regressions  4  0.007  067  0.001  767(n) 

Combined  residual  47  0.090  374  0.001  923 


Total  55  0.336  288 


ln)  Not  significant 

*  *  Significant  at  1  per  cent  level 


analysis  could  have  been  carried  out  by  the  method  shown  in  Example  7.1,  with 
the  modifications  necessary  for  more  than  one  independent  variable. 

The  difference  of  positions  is  highly  significant.  In  this  example,  there  is 
really  no  need  to  make  this  test,  since  the  difference  of  positions  represents 
potency  differences  that  are  already  known  to  exist.  The  analysis  has  been 
given  in  order  to  show  the  method. 

The  three  regression  equations,  based  on  the  combined  regression,  are 

Standard:  Y  =  -0.659  +  0.338^  +  0.807£2 

Bacchus  Marsh:  Y  =  -0.846  +  0.338^  +  0.807^2 

Dwalganup:  Y  =  -0.782  +  0.338^  +  0.807:c2. 

Hence  the  potencies  R  of  each  strain  relative  to  the  standard  are  estimated  as 
follows : 

Bacchus  Marsh:  log  R  =  (-0.846  +  0.659)/0.338 

=    -0.553 
R  =      0.280 

Dwalganup:  log  R  =  (-0.782  +  0.659)/0.338 

=    -0.364 
R  =       0.433. 

The  two  strains  considered  have  potencies  between  a  quarter  and  a  half  that 
of  the  standard.  If  desired,  fiducial  limits  for  the  relative  potencies  may  be 
derived  by  the  methods  given  in  Chapter  6. 

8.4    MORE  DETAILED  COMPARISONS  AMONG 

REGRESSION  EQUATIONS— CONCURRENCE  OF 

REGRESSION  LINES 

The  comparisons  so  far  given  in  this  chapter  have  involved  straight- 
forward applications  of  regression  methods.     Occasionally  more  elaborate 
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comparisons  are  required.  Sometimes  a  set  of  regression  lines,  rather 
than  being  parallel,  appear  to  be  concurrent.  This  type  of  effect  is  likely 
to  occur,  for  example,  when  different  materials  produce  a  different  rate  of 
response,  whereas  all  give  the  same  response  at  some  fixed  level.  This 
level  in  practice  is  often  not  zero  and  is  generally  not  exactly  known. 
When  three  or  more  regression  lines  are  concurrent,  it  is  possible  to 
measure  the  difference  between  the  effects  of  different  materials  or  treat- 
ments by  means  of  the  ratios  of  the  distances  between  the  lines,  measured 
along  an  ordinate.  It  will  be  seen  that  parallel  regression  lines  are  simply 
a  special  case  of  concurrent  lines ;  in  fact,  the  ratio  of  the  effectiveness  of 
different  materials  may  be  measured  in  the  same  way  as  for  concurrent 
lines,  that  is,  by  the  distances  between  the  intercepts  on  a  transversal. 

For  a  discussion  of  the  fitting  of  concurrent  regression  lines,  reference 
may  be  made  to  the  papers  of  Tocher  (1952)  and  Williams  (1953).  These 
papers  do  not,  however,  present  the  method  of  analysis  in  as  convenient  a 
form  as  that  given  here.  The  method  given  here,  basing  the  tests  of 
significance  on  the  analysis  of  covariance,  is  new. 

We  shall  denote  the  point  of  concurrence  of  the  lines  by  (£,  rj).  When 
this  point  is  known,  the  problem  is  amenable  to  standard  methods,  for 
regression  lines  through  a  given  point  are  fitted  in  the  same  way  as  a 
regression  line  through  the  origin,  as  described  in  Chapter  2.  When  the 
ordinate  r\  is  unknown,  some  more  detail  is  required,  as  will  now  be  shown. 

Suppose  that  as  before  there  are  m  groups,  that  there  are  n  values  in  each 
group,  and  that  the  values  of  the  independent  variable  x  are  the  same  for 
each  group. 

We  shall  require  the  following  notation : 

Unrestricted    yr  value  of  y  in  rth  group 

regression    yr  mean  of  y  for  rth  group 

pr  sum  of  products  of  x  and  y  from  rth  group 

tr  sum  of  squares  of  x  (the  same  for  each  group) 

br  regression  coefficient  for  rth  group 

P      2/v/w 

r 

b        ^brjm 

r 

Concurrent     pr'      $yr{x  —  I) 
regression     tr'       S(x  —  £)2 

br'      regression  coefficient  for  rth  concurrent  line 

P'         2,Pr'l™ 
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xci  yc  estimated  coordinates  of  point  of  concurrence 

r 
lr  r 

lr         f 

When  the  ordinate  of  concurrence  r\  is  unknown,  it  may  be  estimated  by 
noting  that,  in  this  case,  the  point  of  concurrence  will  be  on  the  mean 
regression  line  for  all  the  data.    The  mean  regression  line  is 

Y  =  y  +  b(x  —  x) ; 

hence,  the  estimate  of  r\  is  given  by 

yc  =  y  +  K£  -  *)• 

If  I  is  also  unknown,  an  estimate  is  to  be  substituted. 

Once  yc  is  determined,  the  restricted  regression  coefficients  are  readily 
found.    We  have 

V  =  S(yr  -  ye){x  -  £)IS(x  -  if 
=  lPr'  +  nytf  -  x)]ltr'. 

In  assessing  the  fit  of  concurrent  regression  lines,  there  are  several 
aspects  to  be  tested.  In  comparing  regression  lines  for  several  sets  of 
data,  as  shown  in  Section  8.2,  the  usual  approach  is  to  examine,  first, 
whether  the  slopes  of  the  lines  differ  significantly,  and  second,  whether,  if 
the  slopes  are  assumed  not  to  differ,  the  distances  between  the  parallel 
lines  fitted  to  the  data  differ  significantly.  Each  of  these  comparisons  has 
m  —  1  degrees  of  freedom.  For  the  present  purpose,  these  two  aspects 
are  viewed  in  a  different  way.  The  comparisons  required  are  set  out  in 
Table  8.10.    These  comparisons  are  tested  in  order,  just  as  the  difference 

TABLE  8.10 

D.F. 

Mean  regression  1 

Ordinate  of  concurrence  1 

Difference  of  concurrent  regressions  m  —  1 

Departure  from  concurrence,  for  given  abscissa    m  —  1 

Total  variation  due  to  regressions  2m 

of  slopes  needs  to  be  tested  before  the  distances  between  the  lines  are 
examined.  Departure  from  concurrence  is  tested  first ;  if  this  is  significant, 
no  choice  of  point  of  concurrence  will  give  a  satisfactory  fit  to  the  data. 
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The  sum  of  squares  for  concurrent  regressions,  given  by  the  first  three 
items  in  the  table  above,  has  m  +  1  degrees  of  freedom,  since  m  regression 
coefficients  and  the  ordinate  of  concurrence  have  been  fitted.  This  sum  of 
squares  is  found  to  be 

V  d'2 

fFr        y2mntr 


Of  the  two  parts,  the  first  represents  the  regressions  through  the  point 
(I,  0),  and  the  second  the  departure  of  yc  from  zero.  In  most  applications 
it  would  be  more  appropriate  to  take  the  two  parts  as  the  regression 
through  (|,  rj),  and  the  departure  of  yc  from  rj,  where  rj  is  some  hypothetical 
value  of  the  ordinate.  Thus,  to  test  the  significance  of  the  departure  of  yc 
from  rj,  the  appropriate  sum  of  squares  is 

(yc  -  rjfmntr 


This  is  exactly  analogous  to  the  sum  of  squares  for  testing  the  significance 
of  the  constant  term  in  a  regression  equation. 

From  the  form  of  the  first  part  (the  sum  of  squares  for  the  regressions 
through  a  given  point),  we  see  that  the  sum  of  squares  for  the  difference  of 
concurrent  regressions,  with  m  —  1  degrees  of  freedom,  is  given  by 

I(Pr'-P)2 


For  purposes  of  calculation  and  significance  testing,  it  is  advantageous  to 
express  this  sum  of  squares  in  terms  of  quantities  independent  of  |, 
namely  /,  K,  and  L  defined  in  the  list  of  notation.  We  find  that  the  sum  of 
squares  for  difference  of  concurrent  regressions  equals 

/!(!  -  xfj  -  2(|  -  x)trK  +  *-¥■ 


Departures  from  concurrence  are  measured  by  the  variation  among  the 
m  quantities, 

Vrc  =  Vr  +  K(£  -X)  =  yr  +  Pr(£  -  tytri 

which  are  the  ordinates  of  the  unrestricted  regression  lines  at  x  =  £,  and 
whose  mean  is  yc. 

Since  the  variance  of  each  of  these  quantities  is  proportional  to 

nt: 
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the  sum  of  squares  for  departure  from  concurrence  is 

nt. 


y,  (l  yr?  -  ™yc2J 


tr 


=  ^,[J+2(£-x)K+(£-xfL] 
tr 

with  m  —  1  degrees  of  freedom. 
The  analysis  of  variance  is  set  out  in  Table  8.11. 
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Mean  regression 
Ordinate  of  concurrence 
Difference  of  concurrent 
regressions 


Departure  from  concurrence 
for  given  abscissa 

Total  variation  due  to 
regressions 

Residual 

Total  (uncorrected) 


TABLE  8.11 

D.F. 

Sum  of  Squares 

1 
m  -  1 

mp'2\tr' 
mntryc2/tr' 

2>/  -p'fltr'  = 
r 

1 

r                             t 2  i 

«(£  -  xfj  -  2(1  -  x)trK  +  —  L 
n 

m  -  1 

-,[J  +  2{i;-x)K  +  ^-xfL\ 
tr 

2m 

1                                             trL 
—  {mp'2  +  mntry2)  +  /  H 

m{n  —  2) 

by  difference 

2S(y2) 


The  sum  of  the  first  two  terms  in  this  analysis  equals  the  correction  for 
the  mean  plus  the  usual  sum  of  squares  for  mean,  regression,  namely 

mny2  +  mtrb2. 

In  the  table  it  is  split  up  in  a  different  manner  in  order  to  show  the  term 
for  testing  the  significance  of  a  hypothetical  value  of  the  ordinate  of 
concurrence.  Again,  the  residual  sum  of  squares  is  simply  the  sum  of  the 
residual  sums  of  squares  from  the  individual  regressions  and  would 
generally  be  so  determined. 

Although  it  is  not  necessary  for  the  present  analysis,  in  which  |  is  given, 
it  is  instructive,  and  will  also  be  of  use  later  on,  to  frame  the  significance 
tests  given  here  in  terms  of  an  analysis  of  covariance. 

The  analysis  of  the  variation  between  regression  lines  is  really  an 
analysis  of  two  variables,  yr  and  pr,  which  are  statistically  independent,  and 
whose  variances  are  in  the  known  ratio  \\n  :  tr.     It  can  be  shown  that, 
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since  £  is  given  as  the  abscissa  of  concurrence,  the  variate  br'  (which  is  a 
linear  combination  of  yr  and  /?r,  involving  |)  is  the  explanatory  variable, 
in  terms  of  which  the  variation  of  either  yr  or  pr  can  be  interpreted. 
Actually,  since 

V  =  [pi  +  nyc(i  -  mjtr', 

in  this  example  /?/  is  equivalent  to  br'  as  a  covariance  variable  and  will  for 
convenience  be  used  in  the  analysis. 

Now  since  yr  and  pr  are  statistically  independent,  the  sum  of  squares  for 
their  joint  variation,  with  2(m  —  1)  degrees  of  freedom,  is 

n^(Sr-yf  +  1{pr-Pfltr 

r  r 

n 
The  sum  of  squares  for  pr'  is  likewise 
1 


Hpr'-pyitr' 

r  I* 


n(|  -  xfj  -  2(|  -  x)trK  +  ^l\  , 


so  that  deduction  of  the  sum  of  squares  for  pr'  leaves  a  remainder 

^  [/  +  2(|  -  x)K  +  (I  -  xfL],  (8.2) 

tr 

with  m  —  1  degrees  of  freedom. 

This  remainder  may  alternatively  be  derived  by  transforming  from 
yr  and  pr  to  new  independent  variables 

Pr    =Pr~  "(£  -  *)Vt 

and  yrc  =  br(£  -  x)  +  yr. 

Now/?/  and  yrc  are  independent,  so  the  sum  of  squares  for  regression  of 
yrc  onpr'  should  be  distributed  accordingly.  If  the  regression  is  significant, 
it  reflects  on  the  adequacy  of/?/  as  an  explanatory  covariance  variable, 
and  hence  on  the  given  value  of  |.  Hence,  if  we  regard  £  not  as  given  but 
as  unknown,  we  may  derive  fiducial  limits  by  determining  that  range  of 
values  of  |  for  which  the  regression  of  yrc  on  /?/  is  not  significant.  For 
this  reason  we  shall  call  this  sum  of  squares  for  regression  the  sum  of 
squares  for  the  abscissa  of  concurrence.  Of  course,  this  determination  is 
valid  only  if  the  residual  variation  of  yrc,  which  measures  departures  from 
concurrence,  is  not  significant.  If  there  are  significant  departures  from 
concurrence,  no  set  of  concurrent  lines  is  satisfactory. 

The  difference  between  the  analysis  of  covariance  here  given  and  that 
usually  described  is  that  here  we  have  variates  whose  variances  and 


THE  TREATMENT  OF  HETEROGENEOUS  DATA  143 

covariances  are  known  (apart  from  a  constant  factor),  so  that  the  popula- 
tion regression  of  one  on  the  other  can  be  determined.  Then  the  sum  of 
squares  for  regression  of  yrc  on  /?/  is  really  the  sum  of  squares  for  the 
difference  between  the  population  regression  (which  vanishes  for  these  two 
variables)  and  the  sample  regression. 

This  analysis  has  been  described  in  terms  of  yrc  and  /?/ ;  however,  since 
each  is  a  linear  function  of  yr  and  pr,  the  same  result  would  have  been 
obtained  if  the  regression  of,  say,  yr  on/?/  had  been  determined,  provided 
we  deduct  the  population  regression  coefficient  of  yr  on/?/,  which  is  found 
to  be 

(i  -  *)/'/• 

The  most  useful  method  of  calculating  the  regression  sum  of  squares  is 
by  expressing  it  in  terms  of  the  quantities  J,  K,  and  L  previously  given, 
which  are  independent  of  £.     We  have 


2  VrciPr    -P')=   ~ 


(I 


-xfK-^-x)(^L-j]  -^K 
\n  In. 


t2 


I  (Pr  ~  P'f  =  n{£  ~  *?J  ~  2(1  -  x)trK  +  -L  L. 
r  n 

Hence  the  regression  sum  of  squares  (sum  of  squares  for  abscissa)  equals 
iifr[tf  -  xfK  -  (f  -  x)(^L  ~j)  ~  ^K12 


tr'\n(£  -  xfj  -  2(|  -  x)trK  +—L 

L  n     . 


We  have  thus  arrived  indirectly  at  a  means  of  testing  the  concordance 
of  the  data  with  a  given  value  of  f .  Now  the  sum  of  squares  for  the 
ordinate  of  concurrence  is  a  criterion  for  testing  the  concordance  of  r\ 
with  the  data,  conditional  on  the  given  value  of  £.  Taken  together  with 
the  sum  of  squares  for  abscissa  of  concurrence,  this  enables  the  concordance 
of  £  and  r\  jointly  with  the  data  to  be  tested. 

The  two  sums  of  squares  for  the  joint  test  are  seen  to  be  sums  of  squares 
of  the  yrc,  one  being  for  the  departure  of  their  mean  yc  from  rj,  the  other 
being  for  regression  on/?/.  Thus  the  tests  of  significance  of  the  position 
of  the  point  of  concurrence  reduce  to  an  analysis  of  variance  of  yrc,  and 
the  test  for  departure  from  concurrence  is  based  on  its  residual  sum  of 
squares. 

These  significance  tests  enable  fiducial  limits  for  £  and  rj  to  be  found, 
and  we  have  also  shown  how  r\  may  be  estimated.  To  estimate  £,  we 
equate  to  zero  the  sum  of  squares  for  abscissa,  or,  more  simply,  the  sum  of 
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products  of  pr'  and  yrc.  This  gives  a  quadratic  equation,  one  of  whose 
roots  maximizes  the  sum  of  squares  for  departure  from  concurrence, 
whereas  the  other  minimizes  it.  From  general  regression  theory  it  can  be 
shown  that  the  sum  of  squares  for  departure  from  concurrence,  being  the 
residual  sum  of  squares  from  a  regression,  is  given  by  the  ratio 

-{JL-  K2) 


lr     r 

Since  the  numerator  is  independent  of  |,  it  follows  that  the  ratio  is 
minimized  when  the  denominator,  which  is  the  sum  of  squares  for  difference 
of  concurrent  regressions,  is  maximized,  and  vice  versa.  Thus,  in  esti- 
mating |,  the  appropriate  value  to  choose  is  the  one  that  maximizes 
difference  of  regressions  and  minimizes  departure  from  concurrence. 

If  we  put  xc  —  x  =  z,  the  equation  for  z  is 


K=0, 


giving 


z2K_JtsL_j)  _!r 

\n  In 

-L-J±     l\(j+tS]X--(JL-K< 
n  N  LA         n    J        n 


2K 

The  roots  of  the  equation  are  real,  as  is  apparent  from  geometrical 
considerations. 

Table  8.12  shows  the  relevant  sums  of  squares  resulting  from  the 
analysis  of  the  yrc. 

TABLE  8.12 
D.F.  Calculation  of  Sum  of  Squares 


Ordinate  of  concurrence 


Abscissa  of  concurrence 


Departure  from 
concurrence 


Total 


nt, 


(I 


mntr 

—^  [y  +  bit  ~  x)  -  rjf 

xfK-{£  -x)(-L-j\  -  — 
\n  J        n 


tr 


m  —  2 


t  2L* 

„(|  _  xfj  -  2(1  -  x)trK  +  — 

n 

trtr'(JL  -  K2) 


«(£  -  xfj  -  2(1  -  x)trK  + 


t2L 


ntr 
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From  the  foregoing  it  can  be  seen  that  the  fitting  and  testing  of  con- 
current lines  may  also  be  carried  out  in  the  general  case,  when  the  number 
of  observations  and  their  values  vary  from  group  to  group.  For  given  |, 
the  yrc  may  be  determined  for  each  group ;  since  their  variances  differ,  a 
weighted  analysis  would  be  employed.  The  departures  from  concurrence 
would  be  tested  by  the  weighted  sum  of  squares  of  departures  of  the  yrc 
from  their  weighted  mean,  and  the  ordinate  of  concurrence  by  the  difference 
between  the  weighted  mean  and  the  hypothetical  value  rj.  However,  in 
the  general  case  it  is  not  possible  to  define  an  explanatory  variable,  so  that 
an  exact  test  for  the  abscissa  of  concurrence  cannot  be  made  in  the  way 
just  shown. 

8.5    AN  EXAMPLE  OF  CONCURRENT  REGRESSIONS 

Example  8.2  The  Fitting  of  Concurrent  Regression  Lines  to  the 
Relationship  between  Burst  Strength  and  Basis  Weight  of  Paper 
Made  under  Different  Conditions.  In  a  study  of  the  effect  of  basis  weight 
on  the  properties  of  laboratory-made  sheets  of  paper,  three  batches  of  pulp 
were  taken  and  beaten  in  the  Lampen  mill  for  different  lengths  of  time.  From 
each  batch,  sheets  of  six  different  basis  weights  were  then  made,  and  mechanical 
tests  carried  out  on  them.  The  burst  strength  results  from  this  experiment  are 
set  out  in  Table  8.13,  together  with  values  calculated  from  them. 


TABLE  8.13 

Burst  Strength  Results 

Beating  (revolutions  of  Lampen  Mill) 

1125 

4500 

12,730 

Basis  Weight 

X 

r  =  1 

r  =2 

r  =  3 

Total 

10 

-5 

1.73 

1.98 

3.11 

20 

-3 

4.99 

9.69 

12.83 

30 

-1 

8.74 

17.26 

22.69 

40 

1 

12.60 

24.52 

32.45 

50 

3 

17.04 

31.36 

48.22 

60 

5 

21.18 

43.29 

59.94 

6yr 

66.28 

128.10 

179.24 

373.62 

Pr 

137.26 

278.82 

400.08 

816.16 

A  preliminary  analysis,  as  given  in  Table  8.14,  shows  that  the  three  regression 
lines  of  burst  strength  on  basis  weight  depart  significantly  from  the  origin. 

The  sum  of  squares  for  the  mean  regression  (including  departure  from  the 
origin)  is 
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so  that  the  sum  of  squares  for  difference  among  regressions,  with  four  degrees  of 
freedom,  is 

12,487.960  -  10,927.092  =  1,560.868. 

Clearly  this  is  highly  significant,  so  that  the  lines  differ  in  position  or  slope  or 
possibly  in  both. 

TABLE  8.14 
Test  of  Departure  of  Regression  Lines  from  Origin 


Correction  for  mean 
Regression  through  means 

D.F. 

3 
3 

6 

3 

3 
12 

Sum  of  Squares 

8,821.604 
3,666.356 

Mean  Square 

Sum 

Regressions  through  origin 

12,487.960 
12,311.932 

Departure  from  origin 
Residual 

176.028 
28.554 

58.676** 
2.3795 

**  Significant  at  1  per  cent  level. 

In  order  to  compare  the  trends  for  different  amounts  of  beating  it  would  be 
desirable  to  base  the  comparison  on  regression  lines  through  some  common 
point  other  than  the  origin,  if  such  a  model  were  concordant  with  the  data. 
The  common  point  would  represent  the  basis  weight  for  which  the  value  of 
burst  factor  was  independent  of  beating.  The  ordinate  as  well  as  the  abscissa 
of  this  point  will  be  estimated. 

For  the  independent  variable  we  have  taken 

basis  weight  —  35 

X    =  - , 

5 
as  shown  in  the  second  column  of  Table  8.13.     Its  sum  of  squares  is 

tr  =  70. 
From  the  data  in  Table  8.13  we  find 

m  =  3 

n  =6 

/  =  1066.499 

K=    212.5744 

L  =     42.37452 
JL  -K2=       4.51. 
Also  y  =     20.76 

and  b  =   816.16/210 

=       3.886, 
so  that  yc  =     20.76  +  3.8861. 
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The  sums  of  squares,  in  terms  of  the  unknown  I  and  rjt  are  set  out  in  the 
analysis  of  variance  given  in  Table  8.15.  We  need  first  to  test  departure  from 
concurrence.  This  sum  of  squares  takes  its  minimum  value  0.034  when  £  has 
its  optimum  value  —5.02.  By  comparison  with  the  residual  mean  square 
2.3795  (Table  8.14),  this  sum  of  squares  is  not  significant,  nor  does  it  attain  a 
significant  value  for  any  value  of  £  acceptable  as  an  abscissa  of  concurrence. 
Hence  the  fitting  of  concurrent  regression  lines  to  the  data  is  valid. 

TABLE  8.15 

Sums  of  Squares  Required  for  Tests  of  Concurrent 

Regressions 

D.F.  Sum  of  Squares 

Ordinate  of  {  1,260(20.76  +  3.8861  -  rjf 


concurrence  70  +  6£2 

Abscissa  of  420(21 2.5744I2  +  572.130£  -  2,480.035)2 


concurrence  (70  +  6£2)(6,398.994£2  -  29,760.42£  +  34,605.86) 

Departure  from  52.6(70  +  6I2) 

concurrence  J  6,398.994£2  -  29,760.421  +  34,605.86 

The  point  of  concurrence  is  estimated  by  equating  to  zero  the  sums  of  squares 
for  ordinate  and  abscissa  of  concurrence.  The  value  of  the  abscissa  which 
minimizes  departure  from  concurrence  is  the  appropriate  one  to  take,  and  the 
ordinate  is  the  corresponding  point  on  the  mean  regression  line.    We  find 

xc  =  -5.02 
yc  =     1.26. 

In  order  to  determine  the  region  in  which  the  point  of  concurrence  may  lie, 
we  consider  the  sums  of  squares  for  ordinate  and  abscissa  of  concurrence,  which 
we  may  denote  by  O  and  A  respectively.  Then,  on  the  null  hypothesis  that  £ 
and  rj  are  the  coordinates  of  the  point  of  concurrence,  the  ratio 

O  +  A 


2  x  2.3795 


is  distributed  as  F  with  two  and  twelve  degrees  of  freedom.  Since  the  1  per 
cent  point  of  this  distribution  is  6.9266,  the  99  per  cent  fiducial  region  for  the 
point  of  concurrence  is  defined  by  the  inequality 

O  +  A  <    4.7590  x  6.9266 
=  32.964. 

Using  the  values  of  O  and  A  given  in  Table  8.15,  we  have  calculated  the  fiducial 
boundary.  This  is  plotted  in  Figure  8.1,  together  with  the  concurrent  regression 
lines. 
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Of  particular  interest  are  the  maximum  ranges  for  £  and  r\.  The  maximum 
range  for  I  is  given  when  i]  does  not  depart  from  the  value  given  by  the  mean 
regression  line,  so  that 

A  <    32.964, 

giving  £  =  —7.0  and  —3.7. 

The  maximum  range  for  t]  is  given  when  £  takes  its  optimum  value  —5.02,  so 
that 

O  <  32.964; 

negative  values  being  inadmissible,  the  limits  for  rj  are  found  to  be  0.0  and  3.7. 

8.6    PROPORTIONALITY  OF  REGRESSION  LINES 

An  interesting  problem  whose  solution  is  similar  to  that  of  fitting 
concurrent  regression  lines  is  fitting  proportional  regressions.  Suppose, 
for  example,  that  different  properties  of  a  coal,  such  as  carbon  content, 
sulphur  content,  and  calorific  value,  are  linearly  related  to  its  ash  content. 
It  might  be  expected  that,  if  the  ash  consisted  of  admixed  impurities,  its 
proportionate  effect  would  be  a  simple  percentage  reduction,  the  same  for 
each  of  the  properties.  Hence,  if  for  the  y'th  property  the  regression 
equation  on  percentage  ash  were 

Yj  =  aj  +  bjpc, 

we  should  expect  bi\ai  to  be  in  the  neighborhood  of  — 1/100.  In  general, 
if  the  theoretical  value  of  the  ratio  were  — 1/|,  we  could  write  the  regression 
equations 

7,  =  b^x  -  £), 

and  would  then  be  interested  in  testing,  first,  the  constancy  of  |  for  the 
different  variables,  and  second,  the  acceptability  of  various  values  of  |. 
We  note  that  these  equations  are  similar  to  those  in  the  concurrent 
regression  problem,  except  that  here  r\  vanishes.  Also,  whereas  in  fitting 
concurrent  regressions  the  data  were  independent  groups  of  the  same 
variable,  here  they  are  different  variables,  possibly  correlated. 

We  shall  suppose  that  there  are  m  different  variables  yj9  and  that  the 
variances  and  covariances,  either  known  or  estimated,  are  proportional 
to  the  elements  of  a  matrix  V  with  typical  element  vjk. 

The  tests  of  proportionality  are  based  on  the  analysis  of  the  variate 

Vic  =  Vi  +  btf  -  x), 

which  has  expected  value  zero. 

The  explanatory  variate  is,  as  with  concurrent  regression, 

Pi  =  Sy/x  -  |) 

=  pt  -  ny^  -  x). 
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We  now  define  the  quantities 

J  =  n£l  vjky$k 

3   k 
h  3    k 

n 

L  =  -ri2ZvJkPiPic 

lr     3    k 

each  with  m  degrees  of  freedom.     Then  the  sum  of  squares  of  the  yjc  is 
U 


[j  +  2(|  -  x)K  +  (£  -  xfL] 


(8.3) 


with  m  degrees  of  freedom,  whereas  the  sum  of  squares  for  regression  of 
yjc  on  p/  is 

//  \  t        12 

Q 


nt 


>J:-xfK-{!;-x)(^L-j\   -±K 
\n  I        n    _ 


n(i  -  xfj  -  2(|  -  x)trK  +  —  L 

n 


(8.4) 


We  shall  call  this  latter  the  sum  of  squares  for  the  constant  of  proportion- 
ality. The  residual  sum  of  squares  of  yjc,  measuring  departure  from 
proportionality,  is  found  to  be 


n 


K2) 


pIX^P/P*' 

h    3   h 


(8.5) 


with  m  —  1  degrees  of  freedom. 

If  the  vjk  are  known  population  variances  and  covariances,  the  test  of 
significance  is  immediate.  If  they  are  known  apart  from  a  constant 
factor,  that  factor  can  be  estimated  from  the  residual  sums  of  squares  and 
products  of  the  variables.  If,  however,  as  is  more  likely,  the  vik  are 
themselves  the  residual  sums  of  squares  and  products  from  the  unrestricted 
regressions,  each  based  on  n  —  2  degrees  of  freedom,  the  sum  of  squares 
of  yic  (8.3)  is  distributed  as  the  ratio  of  two  sums  of  squares,  with  m  and 
n  —  m  —  1  degrees  of  freedom.  Likewise  the  regression  sum  of  squares 
(8.4)  and  the  residual  sum  of  squares  (8.5)  are  distributed  as  ratios  of  sums 
of  squares  with  1 ,  n  —  m  —  1  and  m  —  1,  n  —  m  —  1  degrees  of  freedom 
respectively.  Hence  we  may  set  up  the  analysis  of  variance  given  in 
Table  8.16. 

The  analysis  enables  us  to  test  a  given  value  of  the  constant  of  pro- 
portionality, or  to  set  fiducial  limits  for  it,  provided,  of  course,  that 
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departure  from  proportionality  is  not  significant.  The  constant  of 
proportionality  is  estimated  by  equating  the  sum  of  products  of  yic  and  p/ 
to  zero.     Since  other  details  are  the  same  as  for  concurrent  regression, 

TABLE  8.16 


Constant  of  proportionality 

Departure  from  proportionality  m  —  1 

Error 


D.F. 

Sum  of  Squares 

1 

(8.4) 

m  -  1 

(8.5) 

n  —  m  —  1 

1 

n  -  1 

1  + 

ntr 

-7  22  V^icVkc  = 
lr     j    k 

ntr 

vik  +  77  VicVkc 
ir 

rm 

they  are  not  discussed  further  here.  In  other  contexts,  an  analysis 
similar  to  this  one  will  provide  a  test  for  the  constancy  of  a  set  of  ratios 
and  fiducial  limits  for  their  common  expected  value. 

8.7    THE  COMPARISON  OF  REGRESSION  EQUATIONS 
FROM  SETS  OF  CORRELATED  DATA 

In  all  these  examples,  the  different  samples  have  been  independent,  so 
that  comparisons  between  them  can  be  made  directly.  For  certain  data, 
however,  correlations  exist  between  the  sets,  usually  because  each  of  the 
variables  is  affected  by  some  extraneous  variable.  Yates  (19396)  considers 
the  case  in  which  the  variables  in  the  different  sets  are  annual  values,  so 
that  correlations  may  be  expected  among  them,  and  he  sets  up  a  suitable 
model  and  derives  a  test  of  significance.  Carter  (1949)  considers  the  case  in 
which  the  corresponding  values  of  the  variables  in  different  sets  are  related 
through  an  additive  constant.  Example  8.3  presents  some  data  for  which 
his  specification  of  the  problem  is  appropriate.  Besides  these  two 
specifications,  there  are  other  possibilities;  in  any  particular  problem, 
care  is  needed  to  see  whether  the  data  obtained  follow  one  of  the  models 
given  above,  or  whether  some  other  is  needed.  A  very  general  discussion 
has  been  given  by  Kullback  and  Rosenblatt  (1957). 

Carter's  method  will  often  apply,  for  example,  to  data  in  a  double 
classification  (the  classes  of  which  may  be  designated  groups  and  treat- 
ments), wherein  values  of  the  independent  variables  vary  from  group  to 
group,  and  it  is  of  interest  to  compare  the  regressions  at  different  treatment 
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levels,  group  effects  having  been  eliminated.  It  is  readily  seen  that  the 
analysis  of  variance  can  be  applied ;  the  effects  of  groups  and  treatments 
need  first  to  be  eliminated,  and  then  the  regressions  may  be  determined  on 
the  residual  sums  of  squares  and  products.  This  is  in  fact  what  Carter's 
method  reduces  to,  although  in  his  exposition  the  derivation  is  set  out  in  a 
different  way. 

In  a  recent  experiment  on  the  water  absorption  of  a  fibrous  board 
material,  a  number  of  specimens  were  soaked  in  water  for  varying  times, 
and  the  gains  in  thickness  and  weight  recorded.  It  was  of  interest  to 
determine  the  relation  between  these  two  properties  and  to  test  whether 
the  regression  coefficients  differed  significantly  for  different  times.  Now 
for  each  time,  only  a  regression  between  specimens  could  be  determined; 
but  these  cannot  be  validly  compared  without  some  allowance  being  made 
for  the  correlation  introduced  by  the  fact  that  each  regression  was  based 
on  the  same  specimens.  It  is  assumed  that  the  specimen  differences 
introduce  an  additive  effect  which  can  be  eliminated  by  deducting  the 
specimen  means  from  each  of  the  values  for  the  specimen. 

The  question  might  be  raised  whether  we  do  in  fact  want  to  compare  the 
residual  regressions  or  the  between-specimen  regressions  for  different 
times.  It  is  true  that,  if  the  experiment  is  carried  out  for  only  one  time, 
we  can  determine  only  a  regression  between  specimens,  and  that  this 
regression  will  be  of  practical  value  in  predicting  gain  in  thickness  from 
gain  in  weight.  However,  if  our  assumption  is  correct  that  specimen 
effects  are  additive,  it  is  appropriate  to  base  the  comparison  of  regressions 
on  residual  variation.  The  regression  between  specimens  will  still  be 
appropriate  for  prediction  at  any  one  time,  however.  If  the  assumption 
of  additivity  cannot  be  maintained,  some  method  of  analysis  such  as 
Yates's,  which  assumes  a  more  general  form  of  correlation  introduced  by 
group  effects  (in  this  case,  specimen  differences)  is  needed. 

Without  going  into  the  derivation  of  the  method,  we  give  the  normal 
equations  for  the  regression  coefficients. 

The  comparison  is  based  on  an  analysis  of  the  residual  sum  of  squares 
for  the  dependent  variable.  The  reduction  in  this  sum  of  squares,  due  to 
the  fitting  of  a  separate  regression  coefficient  for  each  set,  is  compared 
with  that  resulting  from  the  fitting  of  a  single  regression  for  the  residual 
variation.  Since  the  separate  regression  coefficients  are  correlated,  this 
factor  needs  to  be  allowed  for  in  the  analysis. 

We  consider  data  in  m  sets,  with  n  results  in  each  set.  The  notation  is  as 
follows : 

trs        sum  of  products  of  x  values  in  rth  and  5th  sets 
prs       sum  of  products  of  x  values  in  rth  set  with  y  values  in  5th 
set 
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sum  of  products  of  y  values  in  rth  and  sth  sets 
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'«'   =    (*r.  ~  i) 

Prs    =    [Srs--jPr 
<=    [Srs--)Ur 


where     <5_  = 


1    r  =  s 
0    r^s' 
It  is  readily  verified  that  22  '«*'»  22  /V/>  anc*  22  wr/  give  tne  residual 

r   s  r   s  r  s 

sums  of  squares  and  products  of  x  and  y. 

If  br  is  the  residual  regression  coefficient  for  the  rth  set,  then 

2  bstrs'  =  I  Prs'=Pr'-  (' =  1,  2, 


The  solutions  of  these  equations  are 


m) 


(r=  1,2,  ••-,/?!) 


and  the  sum  of  squares  for  regression,  with  m  degrees  of  freedom,  is 

iKPr- 

r 

If  a  single  regression  coefficient  b  is  fitted  to  the  data,  then 

b  =  22pr:illtr; 

r   s  r    s 

which  is  in  fact  the  ratio  of  the  residual  sum  of  products  to  the  residual 
sum  of  squares  of  x;    the  corresponding  sum  of  squares  is  found  by 
standard  methods. 
The  analysis  then  takes  the  form  shown  in  Table  8.17. 


Sum  of  Squares 

blPr 

r 

by  difference 


T; 

\BLE  8.17 

D.F. 

Mean  residual  regression 

1 

Difference  of  regressions 

m  -  1 

Separate  regressions 

m 

Residual,  reduced 

in~ 

-  \)(m  -  1)  -  m 

2  brPr 

r 

by  difference 


Residual 


(n  -  l)(m  -  1) 


22  < 


154 


REGRESSION  ANALYSIS 


Example  8.3  The  Relationship  between  Gain  in  Thickness  and 
Gain  in  Weight  of  "Pinex"  Hardboard,  Soaked  in  Water  for  Various 
Times.  In  this  experiment,  on  studying  the  sorption  properties  of  various 
cellulosic  materials,  eleven  specimens  of  "Pinex"  hardboard  were  soaked  in 
water.  The  gains  in  thickness  (y)  and  in  weight  (x)  were  measured  after  2,  24, 
and  48  hours  and  recorded  as  percentages  of  initial  weight  and  thickness.  It  was 
of  interest  to  find  the  residual  regression  of  y  on  x,  and  to  determine  whether  the 
regression  relationship  varied  at  different  times. 

The  original  data  are  presented  in  Table  8.18.  Table  8.19  sets  out  the  straight- 
forward analysis  of  covariance  of  the  data,  which  is,  of  course,  carried  out  on 
the  assumption  that  the  residual  regressions  are  equal. 


TABLE  8.18 

"Pinex"  Hardboard:  Percentage  Gains  in  Weight  (x)  and  in 

Thickness  (y)  after  Soaking 

Time 


Specimen 

2hr. 

24  hr. 

48  hr. 

Total 

No. 

y 

X 

y 

X 

y 

X 

y 

X 

1 

8 

13 

19 

35 

21 

40 

48 

88 

2 

6 

11 

16 

30 

17 

34 

39 

75 

3 

7 

10 

18 

29 

21 

36 

46 

75 

4 

6 

8 

18 

26 

19 

33 

43 

67 

5 

5 

7 

14 

20 

18 

27 

37 

54 

6 

7 

11 

17 

31 

18 

36 

42 

78 

7 

8 

14 

18 

37 

20 

44 

46 

95 

8 

9 

15 

23 

41 

23 

46 

55 

102 

9 

3 

6 

13 

22 

15 

30 

31 

58 

10 

8 

12 

17 

32 

20 

39 

45 

83 

11 

8 

11 

17 

32 

20 

38 

45 

81 

Total 

75 

118 

190 

335 

212 

403 

477 

856 

TABLE  8.19 

Analysis  of 

Covariance 

Sums  of  Squares  and  Products 

y, 

Adjusted  for  x 

Sum  of 

Mean 

D.F. 

y2 

yx 

x* 

D.F. 

Squares 

Square 

Specimens 

10 

130.18 

264.24 

704.55 

Times 

2 

984.18 

1984.73 

4028.42 

2 

13.06 

6.53** 

S  xT 

20 

15.82 

14.94 

76.91 

19 

12.92 

0.680 

Total  32      1130.18         2263.91 

**  Significant  at  1  per  cent  level. 


4809.88 
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TABLE  8.20 

Matrices  of  Sums  of  Squares  and  Products  of  Results  in 

Table  8.18,  and  Calculations  for  Testing  the  Differences 

between  Residual  Regressions 


r 
1 

"  80.18 

171.36 

151.91" 

r 

"44.45 

59.82 

46.82 

2 

171.36 

382.73 

342.82 

2 

94.91 

138.64 

102.64 

3 

151.91 

342.82 

318.55 

3 

85.27 

124.09 

96.09 

Sum  of  elements:  2113.64 

=  3  x  sum  of  squares  for  specimens 


-50.64 

-114.27 

212.36 


r 
1 

53.45 

-57.12 

2 

-57.12 

255.15 

3 

-50.64 

-114.27 

-5.91 
26.58 
-5.73 

14.94 


Sum  of  elements :  76.90 

=  residual  sum  of  squares 


Inverse  Matrix:  t' 


10" 


216,514 

94,325 

102,386 


94,325 
46,256 
47,383 


102,386 
47,383 
54,621 


Regression  Coefficients  br 

0.6409 
0.4005 
0.3414 


Mean  regression 


0.1943 


TABLE  8.21 

Analysis  of  Residual  Variance,  to  Test  Differences  among 

Residual  Regressions 


D.F.  Sum  of  Squares  Mean  Square 


Mean  residual  regression 
Difference  of  regressions 

Separate  regressions 
Residual,  reduced 

Residual 


3 
17 

20 


<n)  Not  significant. 

*  Significant  at  5  per  cent  level. 


2.90 
2.00 

4.90 
10.92 

15.82 


2.90* 
1.00(w) 


0.6424 
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In  Table  8.20  the  calculations  for  deriving  the  residual  regressions  are  shown. 
The  three  residual  regression  coefficients  are  derived  in  the  usual  way.  The 
mean  regression  coefficient  is  smaller  than  any  of  the  individual  coefficients 
because  of  the  negative  correlations  among  the  residuals. 

The  analysis  of  variance  in  Table  8.21  shows  that  the  residual  regressions  are 
not  significantly  different. 

8.8     CALCULATION  OF  AN  OVER-ALL  REGRESSION 
COEFFICIENT  FROM  HETEROGENEOUS  DATA 

A  number  of  sets  of  data,  or  a  single  set  of  heterogeneous  data,  will 
often  provide  several  estimates  of  a  regression  coefficient.  We  have 
already  discussed,  in  Section  8.2,  the  comparison  and  combination  of 
regression  coefficients  from  a  number  of  sets  of  similar  data.  In  other 
cases,  although  the  principles  are  the  same,  the  procedure  is  not  quite 
so  straightforward.  For  example,  maximum  compressive  strength  and 
density  measurements  may  be  made  on  a  number  of  specimens  of  a 
species  of  timber.  In  general,  for  reasons  of  practical  convenience,  such 
a  sample  will  not  be  drawn  directly  at  random  from  the  population,  but 
in  at  least  two  stages :  first,  a  sample  of  trees  will  be  selected,  and  then 
from  each  tree  will  be  taken  a  sample  of  specimens.  Then  an  analysis  of 
variance  and  covariance  can  be  carried  out  on  the  data.  The  regression 
of  maximum  compressive  strength  on  density  may  be  determined  from 
two  sources,  the  sums  of  squares  and  products  between  trees  and  the  sums 
of  squares  and  products  within  trees.  If  the  regression  coefficients  differ 
substantially,  a  test  of  significance  is  needed  to  establish  the  reality  of  the 
difference,  but,  if  the  difference  is  not  significant,  a  combined  regression 
coefficient,  suitably  weighted,  is  needed.  The  weights  for  this  combined 
regression  coefficient  would  be  inversely  proportional  to  the  estimated 
variances  of  the  two  coefficients. 

Sometimes,  however,  even  when  the  regression  coefficients  between 
trees  and  within  trees  differ  significantly,  some  combined  coefficient  is 
required  to  represent  the  relationship  that  would  be  found  in  random 
sampling  from  the  species.  Then  the  weighting  would  be  not  inversely 
proportional  to  the  estimated  variances  but  such  as  to  simulate  the 
results  that  would  be  achieved  by  random  sampling.  It  is  often  found, 
however,  that  this  weighted  regression  coefficient  differs  little  from  the 
over-all  regression  coefficient  derived  from  the  total  line  of  the  analysis 
without  regard  to  differences  between  trees,  as  shown  in  the  following 
example. 

If  m  trees  are  sampled,  the  number  of  specimens  from  the  rth  tree  being 
nr,  totaling  n  in  all,  we  have  the  following  analysis  of  variance  and 
covariance : 
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D.F. 

y* 

xy 

X2 

Between  trees 

m  —  1 

»b 

Pb 

h 

Within  trees 

n  —  m 

uc 

Pc 

u 

Total  n  -  1  u0  p0  t0 

The  regression  coefficient  from  the  within-trees  line,  for  example,  would 
be 

bc  =  Pel  tc> 

and  the  residual  sum  of  squares,  with  n  —  m  —  1  degrees  of  freedom, 
would  be 

(n-  m-  l)s2  =  uc  -  p2jtc. 

The  variance  of  bc  would  be  estimated  from  the  residual  mean  square  s2  as 

Similar  results  (accurate  enough  for  our  purpose,  despite  the  possible 
inequality  of  sampling  of  different  trees)  would  apply  for  the  between-trees 
line  of  the  analysis.  Thus  the  two  regression  coefficients  could  be  com- 
pared and,  if  it  were  appropriate,  combined. 

The  sums  of  squares  and  products,  adjusted  to  simulate  random 
sampling,  are  found  as  follows : 

ua~  uo~  (2  n?  ~  n)ujn(n  —  m),  etc. 

r 

The  second  term  is  small  if  m  is  large  and  may  usually  be  ignored.  The 
divisor  is  reduced  from  n  —  1  to  (n  —  2  n2jri)  by  this  adjustment. 

r 

Example  8.4  The  Regression  of  Maximum  Compressive  Strength 
on  Density,  for  Specimens  of  Hoop  Pine.  Table  8.22  gives  the  sums  of 
squares  and  products  of  maximum  compressive  strength  and  density,  analyzed 
into  components  between  and  within  trees.  The  total  number  of  specimens 
taken  from  the  30  trees  is  188,  and 

2>r2  =  1346. 

r 

The  reduced  sums  of  squares  and  mean  squares  are  given  in  the  final  columns  of 
the  table. 

The  regression  coefficients  and  their  variances  are  set  out  in  Table  8.23,  in 
which  it  is  shown  that  the  difference  is  significant  at  the  5  per  cent  level.  If  the 
significance  of  the  difference  were  ignored,  the  combined  regression  coefficient 
would  be  obtained  by  weighting  the  two  regression  coefficients  in  the  ratio 
417.0  :  410.1 ;   its  variance,  presented  in  the  table,  is 

410.1  x  417.0 


410.1  +  417.0 
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TABLE  8.22 

Sums  of  Squares  and  Products  of  Maximum  Compressive 

Strength  (lb./sq.  in.)  (y)  and  Density  (lb./cu.  ft.)  (x)  for 

Specimens  of  Hoop  Pine 

Sums  of  Squares  and  Products  y,  Reduced 


D.F.  y*  xy  x*         D.F.         S.S.  M.S. 


Between  trees  29       82,857,000    305,790    1,400.3       28     16,080,000  574,300 

Within  trees  158       47,182,000      78,890       546.8      157    35,800,000  228,000 


Total  187      130,039,000    384,680    1,947.1 

Adjusted  to  random 

sampling  128,200,000    381,600    1,925.8 


^^-"  =  0.03898 


TABLE  8.23 

Regression  Coefficients  from  Different  Lines  of  Table  8.22 

b                      Variance  S.E. 

Between  trees                             218.4                      410.1  20.3 

Within  trees                                144.3                      417.0  20.4 


Difference  74.1  827.1  28.8 

t  —  74.1/28.8  =  2.57,  significant  at  5  per  cent  level 

Combined  (weighted)  181.7  206.8  14.4 

Total  197.6  245.0  15.7 

Adjusted  198.2 

The  sums  of  squares  and  products,  adjusted  to  random  sampling,  are  given 
at  the  foot  of  Table  8.22,  from  which  the  adjusted  regression  coefficient  is  derived. 
Its  difference  from  the  total  regression  coefficient  is  clearly  negligible.  The 
variance  of  the  total  regression  coefficient  may  be  found  by  noting  that  the 
coefficient  is  a  weighted  mean  of  bb  and  bc,  with  weights  in  the  ratio  1400.3  :  546.8, 
or  0.719  :  0.281,  so  that  its  variance  is 

410.1  x  0.7192  +  417.0  x  0.2812  =  245.0. 

Methods  for  the  combining  of  information  from  different  sources  may 
be  extended  to  multiple  regression.  The  following  example  shows  how, 
in  a  multiple  regression,  additional  information  about  one  of  the  coeffi- 
cients may  be  included.  This  question  is  discussed  in  more  detail  by 
Durbin  (1953). 
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Example  8.5  The  Effect  of  Load  and  Rate  of  Loading  on  Time  to 
Failure  of  Wooden  Specimens  Loaded  in  Compression  Parallel  to 
the  Grain.  An  experiment  was  carried  out  in  which  specimens  of  Queensland 
maple  {Flindersia  brayleyana  F.v.M.)  were  loaded  in  compression  at  various 
rates  of  loading,  to  a  predetermined  load  that  was  then  sustained  until  the 
specimen  failed.  The  object  was  to  study  the  effects  of  rate  of  loading  and  of 
sustained  load  on  the  time  the  specimen  took  to  fail.  Since  the  specimens 
varied  in  their  intrinsic  strength,  the  loads  were  standardized  as  given  percentages 
of  the  estimated  load  required  for  immediate  failure;  these  "ultimate  loads" 
were  determined  by  means  of  tests  on  neighboring  specimens,  with  accuracy 
sufficient  for  their  errors  to  be  ignored  in  the  analysis. 

This  experiment  and  its  analysis  are  fully  described  by  Ditchburne  (1959). 

The  specimens  were  taken  from  a  number  of  planks ;  different  planks  were 
allocated  to  different  rates  of  loading,  and  the  specimens  of  each  plank  were 
allocated  to  different  percentage  loacls.  However,  since  not  every  percentage 
was  applied  to  each  plank,  the  variation  between  planks,  as  well  as  that  within 
planks,  contributed  some  information  on  the  effect  of  percentage  load. 

It  was  found  that  a  satisfactory  analysis  of  the  data  was  given  by  the  regression 
of  log10  (time  to  failure  in  seconds)  (y)  on  log10  (percentage  load)  (xj  and  log10 
(rate  of  loading,  in  lb./min.)  (x2).  The  analysis  is  therefore  carried  out  as 
follows.  First,  the  sums  of  squares  and  products  of  each  of  the  three  variables, 
between  and  within  planks,  are  calculated;    these  are  shown  in  Table  8.24. 


TABLE  8.24 

Analysis  of  Covariance  of  log  Percentage  Load  (xj,  log 

Rate  of  Loading  {x2),  and  log  Time  to  Failure  (y) 


Sums  of  Squares  and  Products 


D.F. 

y2 

xxy 

x2y              V 

"ft 

«/ 

planks             49 
Within  planks   246 

16.5353 
162.1796 

107.350 
-629.050 

-26.4452    3191.35 
3344.64 

-719.400 

167.0640 

Total  295     178.7149     -521.700    -26.4452   6535.99    -719.400    167.0640 

0.70765 +W  173.8800     -553.089    -18.7126   5602.84    -509.047     118.2145 


Then,  for  each  line  of  the  analysis,  the  reduced  sum  of  squares  and  mean  square 
of  y  are  calculated,  in  order  to  give  estimates  of  the  relative  precision  of  between- 
plank  and  within-plank  comparisons.  These  analyses  are  shown  in  Table  8.25. 
As  a  by-product,  the  partial  regression  coefficients  for  each  line  are  determined. 
In  combining  the  information  about  bx  from  the  between-plank  and  within- 
plank  variation,  some  adjustment  needs  also  to  be  made  to  the  value  of  b2,  to 
allow  for  the  change  of  b1  from  its  between-plank  estimate,  with  which  the 
original  value  of  b2  is  associated.  This  is  done  by  the  following  method.  In 
Table  8.25,  the  final  column  shows  that  the  variance  between  planks  exceeds  that 
within  planks,  the  ratio  being 


0.2531/0.1791  =  1.4132. 
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TABLE  8.25 
Calculation  of  Reduced  Variation  for  Each  Line  of  Table  8.24 


Between  planks 
Within  planks 


Regression  coefficients     „  _ 

b Sum  of 

bi  b%  Squares 

0.458  820        4.6416 


-0.069  790 
-0.188  077 


118.3099 


Reduced  variation 

Sum  of 

Mean 

D.F. 

Squares 

Square 

47 

11.8937 

0.2531 

245 

43.8697 

0.1791 

Both  these  variances  are  estimated  from  a  substantial  number  of  degrees  of 
freedom,  so  that  their  errors  may  be  ignored.  Hence,  in  combining  the  informa- 
tion from  variation  between  planks  and  within  planks,  the  appropriate  weight 
for  the  sums  of  squares  and  products  between  planks  is 

1/1.4132  =0.7076. 

The  values  of  0.70765  +  W  (B  and  W  standing  for  the  corresponding  sums  of 
squares  and  products  between  and  within  planks  respectively)  are  set  out  at  the 
foot  of  Table  8.24.  From  these  combined  results  the  inverse  matrix  and 
regression  coefficients  are  determined,  in  Table  8.26.  It  is  seen  that,  as  might 
be  expected,  the  final  value  of  bx  differs  little  from  its  within-plank  estimate. 


TABLE  8.26 

Combination  of  Between-Plank  and  Within-Plank  Information 

Inverse  Matrix  (from  0.70765  +  W  line)  and  Regression  Coefficients  bt 


10" 


293.1848 
1,262.492 


1,262.492 
13,895.65 


-0.185  782 
-0.958  294 


The  standard  errors  of  bx  and  b2  are  now  determined  in  the  usual  way ;   thus 
S.E.  (6j)  =  V(0.1791  x  293.1848  x  10-6)    =0.0072 
S.E.  (b2)  =  V(0.1791  x  13,895.65  x  10~6)  =  0.050. 


A  comprehensive  test  may  also  be  made  by  means  of  an  analysis  of  variance, 
as  set  out  in  Table  8.27.  The  sum  of  squares  for  regression  is  found  in  the 
usual  way,  and  the  total  sum  of  squares  is  found  at  the  foot  of  Table  8.24. 
The  residual  sum  of  squares  is  292  times  the  residual  mean  square  within  planks 
from  Table  8.25;  its  actual  degrees  of  freedom  are  only  245,  but  formally  it  is 
attributed  the  47  +  245  =  292  degrees  of  freedom  for  reduced  variation  from 
Table  8.25.  The  sum  of  squares  for  difference  of  regressions  is  found  by 
subtraction,  although  it  may  also  be  found  directly  from  the  comparison  of  the 
two  regression  coefficients  b±  in  Table  8.25.    The  difference  of  regressions  is 
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significant  only  at  the  5  per  cent  level  and  may  be  considered  not  to  invalidate 
the  determination  of  the  combined  regression. 

TABLE  8.27 
Analysis  of  Variance  (from  0.70765  +  W  Line) 


Combined  regression 
Difference  of  regressions 
Residual 

D.F. 

2 

292 
295 

Sum  of  Squares 

120.6862 
0.9083 

52.2855 

Mean  Square 

0.9083* 
0.1791 

Total 

*  Significant  at  5  per  cent  level. 

173.8800 

8.9    GENERAL  COMMENTS  ON  HETEROGENEOUS 

DATA 

The  last  two  examples  differ  from  preceding  ones  in  that  the  different 
sources  of  variation  are  of  different  kinds  (between  and  within  trees, 
between  and  within  planks),  whereas  in  earlier  examples  the  sources  of 
variation  were  different  but  of  the  same  kind.  The  applications  of 
regression  analysis  to  heterogeneous  data  of  the  type  discussed  in  the  last 
section  are  limited,  because  it  is  not  often  realistic  to  assume  that  different 
sources  of  variation  are  affecting  the  variables  but  that  the  regression 
relationships  are  the  same  or  bear  some  relation  to  one  another.  Further- 
more, when  the  independent  variable  is  subject  to  random  error  (as  it 
will  often  be  in  data  of  this  type),  the  regression  coefficient  is  less  in 
absolute  value  than  the  corresponding  coefficient  in  the  underlying  relation 
(if  linear)  between  the  variables.  The  greater  the  fraction  of  the  total 
variation  which  is  random,  the  greater  this  effect  will  be.  Thus,  in 
heterogeneous  data,  the  same  underlying  relation  may  be  reflected  in 
different  regression  relations.  For  example,  in  Example  8.5,  the  regression 
coefficient  between  planks  would  be  expected  to  exceed  that  within  planks, 
so  that  the  significant  difference  noted  in  the  regression  coefficients  may 
reflect  an  even  greater  difference  in  the  underlying  relation.  This  matter 
will  be  considered  further  in  Chapter  1 1 .  The  object  of  these  remarks  is 
merely  to  draw  attention  to  the  difficulties  that  can  arise  with  heterogeneous 
data;  it  is  important,  when  such  data  are  being  treated,  to  consider 
carefully  the  assumptions  that  are  made. 


CHAPTER    9 


Simultaneous 
Regression  Equations 


9.1    INTRODUCTION 

In  this  chapter  we  consider  the  determination  and  interpretation  of 
simultaneous  equations  fitted  to  experimental  data.  Although  little  has 
been  written  on  simultaneous  equations  in  experimentation,  their  uses  in 
economics  have  frequently  been  discussed.  In  that  field,  however,  there 
is  often  no  distinction  between  dependent  and  independent  variables.  In 
what  is  known  in  econometrics  as  a  complete  system  of  simultaneous 
equations,  there  are  as  many  equations  as  endogenous  variables,  so  that 
the  equations  consist  of  a  linear  transformation  from  the  unknown 
disturbances  and  known  exogenous  variables  to  the  observed  variables. 
The  treatment  of  simultaneous  equations  in  econometrics  is  generally 
troublesome  and  depends  on  the  completeness  of  the  system  of  equations 
and  the  identifiability  of  the  parameters. 

In  experimental  work,  on  the  other  hand,  there  is  in  many  situations  a 
clear  distinction  between  the  dependent  and  independent  variables.  Thus 
the  number  of  equations  will  be  at  most  equal  to  the  number  of  dependent 
variables.  In  this  field,  too,  there  is  a  case  of  particular  interest,  as  will 
be  shown  later,  which  occurs  when  the  numbers  of  dependent  and 
independent  variables  are  the  same.  The  applications  of  simultaneous 
equations  to  experimental  work  are  quite  important  and  are  much  more 
straightforward  than  those  in  econometrics,  yet,  strangely  enough,  they 
have  been  little  discussed.  The  only  published  work  in  this  field  with 
which  we  are  familiar  is  that  of  Box  and  Hunter  (1954),  but  even  this 
relates  to  a  different  situation  from  that  considered  here,  and  to  particular 
applications  in  experimental  design. 

We  begin  by  discussing  a  simple  application  of  simultaneous  equations 
to  experimental  work.  Then  will  follow  the  mathematical  theory,  after 
which  special  cases  will  be  discussed. 
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9.2    A  CHEMICAL  EXAMPLE 

Example  9.1  Simultaneous  Estimation  of  Glucose  and  Galactose 
in  Solution.  Fisher,  Hansen,  and  Norton  (1955)  discuss  the  simultaneous 
quantitative  determination  of  both  glucose  and  galactose  in  solutions  of  unknown 
chemical  composition,  by  means  of  optical  density  measurements.  Without 
going  into  the  technical  details,  which  are  given  in  the  paper  referred  to,  we  can 
simply  state  that  solutions  of  glucose  and  galactose  are  treated  to  develop  a 
color;  the  optical  density  of  the  solution  to  light  of  two  different  wavelengths  is 
then  determined,  and  the  two  data  thus  obtained  are  used  to  estimate  the 
amount  of  each  sugar  in  solution.  It  is  assumed  that,  within  the  range  of 
concentrations  studied,  optical  density  for  each  sugar  is  proportional  to  the 
amount  of  sugar;  then  we  make  use  of  the  fact  that  each  sugar  differs  in  its 
density  to  light  of  different  wavelengths. 

Solutions  containing  known  amounts  of  glucose  and  galactose  were  prepared, 
and  the  density  at  two  different  wavelengths  (470  and  560  m/j)  was  determined. 
The  data  enable  a  regression  of  density  on  amount  of  each  sugar  to  be  determined 
for  each  wavelength.  These  regressions  then  constitute  a  calibration  of  the 
apparatus,  such  that  if  optical  densities  for  some  unknown  solution  are  substi- 
tuted in  the  equations,  the  amount  of  each  sugar  can  be  estimated. 

Thus,  if  yx  and  y2  are  the  optical  densities  at  470  and  560  mju  respectively, 
and  xx  and  x2  are  the  amounts  of  each  sugar  (in  milligrams),  the  regression 
equations  may  be  written 

Y1  =  b^x-^  +  b21x2 

Y 2   =  ^12^1   "1"  ^22X2* 

These  equations  have  no  constant  term,  since  the  optical  densities  are  zero 
at  zeTo  concentration  of  the  sugars.  In  the  practical  use  of  these  equations, 
the  2/'s  will  be  observed  values  and  the  se's  predicted.  If  the  equations  are 
solved  for  this  purpose,  we  get 

^  =  ^2/1+^2/2 
X2  =  P*yi  +  tPyt 
where  the  matrix 

'b11    b2l~ 

612       £22_ 

is  the  inverse  of  the  original  matrix  of  regression  coefficients, 

~bxl    b. 


The  equations  (9.2)  will  be  called  inverse  regression  equations,  and  the  X 
values  inverse  estimates.  It  will  be  seen  that  in  practically  every  calibration 
problem  inverse  estimates  are  required,  since  the  quantities  arbitrarily  assigned 
in  the  calibration  are  unknown  in  the  application  to  estimation. 

Problems  of  this  kind  are  of  frequent  occurrence  in  quantitative  chemical 
analysis  and  in  other  fields.  The  determination  of  the  accuracy  with 
which  estimates  can  be  made  from  such  equations  is  an  important  practical 
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problem.  We  now  give  the  mathematical  derivation  of  sampling  errors 
and  fiducial  intervals,  before  returning  to  the  arithmetical  analysis  of  the 
example  just  discussed. 

9.3    SIMULTANEOUS  EQUATIONS  IN  GENERAL 

In  general,  we  may  consider  that  we  have  n  observations  on  each  of  p 
independent  variables  xi  (i  ==  1,  2,  •  •  -,p)  and  q  dependent  variables 
Vjij  =  1,  2,  •  •  -,  q),  and  that  we  must  estimate  the  yj  in  terms  of  the  xi  or 
vice  versa.     Then  we  may  determine  q  regression  equations 

Y^lb^x,         (y=  1,2,  •••,?),  (9.3) 

i 

in  which  for  simplicity  the  variables  are  measured  from  their  means  so 
that  the  constant  terms  vanish. 

Besides  the  standard  notation  used  throughout  this  book,  we  introduce 
the  following  notation : 

ujk        total  sum  of  products  of  yj  and  yk(n  —  1  degrees  of  freedom) 

wjk        residual  sum  of  products  of  yj  and  yk  (n  —  p  —  1   degrees  of 
freedom) 

U=(ujk) 

W  =  (wjk)         W-i  =  (wf). 

Lower  case  xi  or  yi  will  denote  either  observed  or  potentially  observed 
(although  sometimes  actually  unknown)  quantities,  and  capital  Xi  or  Yj 
will  denote  estimates  based  on  the  observed  quantities. 

9.4    DIRECT  ESTIMATION 

If  one  of  the  yi  or  a  linear  combination  of  them  is  to  be  estimated  from 
the  equations  (9.3),  the  procedure  is  straightforward.  For  the  estimated 
variances  of  the  regression  coefficients  we  have  the  familiar  results 

(n-p-  l)fV(bti)  =  w„t« 
and,  generally, 

(n-p-  1)  Cov  (bhj,  bik)  =  wjkthi 

=  (n  -  p  -  1)  Cov  (bij9  bhk). 

Hence,  for  the  variance  of  an  estimate,  we  have 

(„  _  p  ^  i)  V{  7,)  =  wj1-  +  21  (V.1 
\n       h  i  I 
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and  in  general  for  the  covariance  of  any  two  estimates, 

[n-p-\)  Cov  "(Yj,  Yk)  =  wj-  +  22  iMVi)> 


0 


the  term  l//i  being  included  to  allow  for  the  fact  that  the  variables  are 
measured  from  their  means. 

In  order  to  know  how  much  a  new  observation  yj  will  vary  about  the 
predicted  value,  we  need  the  variance  about  an  estimate,  as  well  as  the 
variance  of  the  estimate.     We  have 

{n-p-X)V{y.-     Yj)    =   HWjj 

and,  generally, 

(n  -  p  -  1)  Cov  {yj  -  Y^  yk  -  Yk)  =  Hwjk 
where 

H  =  1  +  -  +  22  iMVi-  (9-4) 

n       h  i 

If  it  is  required  to  estimate  a  linear  combination  of  the  y-,  for  example, 

Va  =  2  aiVv 

3 

the  regression  coefficients  are  linear  combinations  of  the  original  coeffi- 
cients, namely, 

bia  =  2  aobiv 

j 
and  the  regression  equation  for  estimating  ya  may  be  written 

Ya  =  2  he** 

i 

The  variance  of  an  estimate  is  given  by 

(„  _  p  -  i)  V(  ra)  =  22  afywj-  +  22  'Vj  • 

j  k  \n       h  i  1 

A  special  case  of  a  linear  combination  of  the  dependent  variables  is 
Hotelling's  "most  predictable  criterion."  For  a  linear  combination  with 
coefficients  a^  the  residual  sum  of  squares  after  fitting  the  regression  on 
the  xi  is 

22  vfh&ik  (9-5) 

3    k 

and  the  total  sum  of  squares  is 

22<¥Wfc-  (9-6) 

3    k 

The  linear  combination  which  minimizes  the  ratio  of  (9.5)  to  (9.6)  will 
clearly  be  an  estimate  of  that  linear  combination  least  affected  by  departure 
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from  regression ;  Hotelling  has  designated  it  the  most  predictable  criterion. 
The  coefficients  cij  will  be  found  as  one  of  the  latent  vectors  of  the  matrix 
W~XJJ.  Whether  this  linear  combination  has  any  relevance  to  the 
interpretation  of  the  data  will  depend  on  the  nature  of  the  problem. 

9.5    INVERSE  ESTIMATION 

As  mentioned  earlier,  we  are  most  often  interested  in  using  a  set  of 
simultaneous  regression  equations  inversely  for  estimating  values  of  the 
independent  variables  from  observed  values  of  the  dependent  variables. 
This  situation  arises  frequently,  for  example,  in  calibration  experiments,  as 
Example  9.1  shows.  Now  in  order  that  the  regression  equations  may  be 
solved  for  the  independent  variables,  it  is  necessary  that  the  number  of 
equations  equal  the  number  of  independent  variables.  If  there  are  fewer 
equations  than  independent  variables,  they  cannot  be  solved,  and  all  that 
can  be  determined  are  certain  relationships  among  the  estimated  values  of 
the  independent  variables.  On  the  other  hand,  if  there  are  more  equations 
than  unknown  independent  variables,  we  have  redundant  information; 
however,  by  an  adaptation  of  the  method  of  least  squares,  valid  estimates 
of  the  unknowns  may  be  determined.  Then  the  discrepancies  of  the 
individual  equations  from  these  estimates  provide  a  measure  of  the 
consistency  of  the  different  equations  and  hence  of  the  different  dependent 
variables.    We  shall  consider  each  of  these  cases  in  turn. 

(0  P  =  <1 

Here  the  regression  equations  (9.3)  may  be  solved  directly  to  give  the 
estimates  of  the  x{,  which  we  denote,  without  risk  of  confusion  with 
direct  estimates,  by  Xt.    The  solutions  are 

3 

where  the  bji  are  the  elements  of  the  matrix  inverse  to  the  square  matrix 

B  =  (bi0). 

We  note  that  in  the  matrix  B  rows  correspond  to  x  variables  and 
columns  to  y  variables,  whereas  in  B'1  rows  correspond  to  y  variables  and 
columns  to  x  variables.  Thus,  for  either  direct  or  inverse  regression 
equations,  the  regression  coefficients  corresponding  to  any  predictand 
are  read  down  the  columns. 

We  shall  show  below  how  tolerance  limits  for  values,  corresponding  to 
the  estimates  X{,  may  be  determined  by  means  of  the  F  test.  First  of  all, 
however,  it  is  of  interest  to  determine  approximate  standard  errors  for 
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these  estimates.  These  standard  errors  will  be  applicable  when  the 
estimated  regression  coefficients  are  large  compared  with  their  standard 
errors,  and  the  inverse  regression  coefficients  are  likewise  large  compared 
with  their  standard  errors.  This  second  condition  requires,  in  particular, 
that  the  matrix  B  be  not  almost  singular. 
Now  we  have 

BB'1  =  I; 

hence,  on  taking  differentials  and  multiplying  the  results  by  2?-1,  we  find 

dB-1  =  -B~\dB)B~1,  (9.7) 

whence 

dbji=  -ZIbHkidbhk.  (9.8) 

/*   k 

The  equations  (9.7)  and  (9.8)  represent  a  linear  transformation  of  the 
differentials  dbhk.  Taking  the  direct  product  (van  der  Waerden,  1931)  of 
such  a  transformation  and  its  transposed,  we  have 

dB'1  X  dB'-1  =  (B~1(dB)B~1)  X  {B'~1{dB')B,-v) 

=  (B-1  X  B'-^dB  X  dB'^B-1  X  B''1).         (9.9) 

Now  each  of  the  direct  products  in  equation  (9.9)  is  a  p2  x  p2  matrix 
whose  typical  elements  are  products  of  two  regression  coefficients  or 
differentials.     For  instance,  the  typical  element  of  dB  X  dB'  is 

dba  dbrr. 

If  we  take  expectations  of  each  side  of  equation  (9.9),  we  obtain  on  the 
left-hand  side  the  matrix  of  variances  and  covariances  of  the  bji,  whereas  on 
the  right-hand  side  the  middle  factor  gives  the  matrix  of  variances  and 
covariances  of  the  b^.  Now,  as  we  have  seen,  the  appropriate  estimate  of 
the  covariance  of  b{j  and  brr  is 

tu'wir\(n-p-  1). 

If  we  make  a  suitable  permutation  of  rows  and  columns,  the  expected 
value  of  the  middle  factor  therefore  becomes  the  direct  product 

r-1  X  W\(n  -p-\), 
of  two  p  x  p  matrices. 

After  further  suitable  permutations  of  rows  and  columns,  we  find  for  the 
estimated  expected  value  of  the  left-hand  side 

(B-1  X  B'-^iT-1  X  WXB'-1  X  B-1)^  -  p  -  1) 
=  (B^T^B'-1)  X  (B'^WB-1)^  -  p  -  1)  =  M'1  X  QrY\{n  -  p  -  1) 
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where  M  =  B'TB 

and  Q  =  BW^B', 

so  that  M-1  =  B^T^B'-1 

and  Q-1  =  B'-^WBT1. 

This  result  gives  in  particular 

(n-p  -  1)  Cov  {bji,  bir)  =  mij'(f' 


=  22  thh'bjhbj'h'  22  M"^ 

A  A'  *  A' 

=  (w  -  p  -  1)  Cov  (bji>,  bri). 

These  results  are,  of  course,  approximate  and  will  often  be  inaccurate ; 
their  interest  lies  in  the  fact  that  the  expressions  found  are  similar  to  those 
occurring  in  the  exact  analysis. 

We  may  now  determine  approximate  variances  and  covariances  of 
estimates  X{  based  on  observations  «/,•: 

(n-p -I)  V(Xt)  =  (n-p-l)  F(2  V%) 

3 

=  (i  +  -)  22  whW*  +  2212  yiyrthh'blhy'h'12  %W! 

N  K'     5  ¥  3  3'  A  A'  k  k' 

=  22  wt,b»f4i  +  -  +  22  tm'xhxX  (9.io) 

j  y  \        n       h  h'  i 

This  result  follows  from  the  formula  for  the  approximate  variance  of  a 
product,  and  from  the  fact  that  the  bji  and  the  yi  are  independent. 
Similarly,  to  the  same  degree  of  approximation, 

(n-p -I)  Cov  (X{,  XJ  =  22  wjrb^'<(l  +  -  +  22  thh'xhxh) . 

j  y  \         n       h  A'  / 

The  co variance  matrix  of  the  X{  may  be  written  HQ~x\{n  —  p  —  \) 
where  Q"1  again  equals  B'~^WB~X,  and  H  is  a  function  of  the  estimates 
rather  than  of  observed  values  as  defined  in  (9.4).  It  may  be  noted  that 
these  results  are  analogous  to  those  found  in  direct  estimation  of  an 
observation  y$.     There  we  have 

(n-p-l)V(yj-Yj)  =  wjjH 
and  (n-p  —  1)  Cov  (ys  -  73,  yr  -  Yr)  =  wjrH 

where  the  xh  are  now  observed  quantities,  the  Yj  are  regression  estimates, 

and  the  y^  are  new  observations,  not  used  in  determining  the  regression. 

The  exact  determination  of  sampling  variation  is  not  much  more 
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complicated.     We  may  find  simultaneous  fiducial  limits  for  the  unknown 
quantities  xi  in  the  following  way.     The  ratio 

„    7n  22  wjk  ivo  -  2  M.-)(y*  -  2  t>ik*i) 

P       '  ~~    B       " ~ 

is  distributed  as  FWithp  and  n  —  2p  degrees  of  freedom.  By  substituting 
various  sets  of  values  of  the  xi  in  the  formula,  we  can  determine  for  which 
sets  the  associated  value  of  F  is  nonsignificant,  and  hence  which  sets  are 
concordant  with  the  data.  The  range  of  concordant  sets  of  the  xi  defines  a 
fiducial  region  for  the  values. 
Now,  since  we  may  write 

Vi  =  2  buxi> 

i 

the  yj  being  observations  and  the  Xt  estimates,  we  have 

22  w*  fe  -  2  *«*<)(%  -  2  b**d  =  2222  (*k  -  ^X^  -  ^^KK 

j   k  i  i  h   i  j    k 

=  21{Xh-xh\Xi-x^qu       (9.11) 

h   i 

where 

%i  =  22  **%&*• 

j  k 

Since  qM  is  a  typical  element  of  the  matrix 

Q  =  BW^B', 

expression  (9.11)  may  be  written 

(X  -  x)BW-1B'(X'  -  x'). 

Hence  the  simultaneous  fiducial  limits  for  the  values  xi  are  given  by  the 
solution  (if  real)  of 

=  n_-2p  hi 

P  H 

If  limits  for  a  single  value  xh  are  required,  we  have 

V(Xh)  =  Hq™l(n-p-l).  (9.12) 


Now  since 


so  that 


Q  =  BW^B', 
Q-1  =  B'-tWB'1, 

qM  =  22  ^okbihbk\ 

j   k 
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Hence,  with  1  and  n  —  p  —  1  degrees  of  freedom, 

(n-p-\){Xh-xhf 

HZZwjkbjhbkh       ' 
j  k 

Note  that  the  variance  estimate  given  by  (9.12)  differs  from  the  approximate 

estimate  given  in  (9.10)  by  the  replacement  of  calculated  quantities  Xt  by 

unknowns  xt.    In  practice,  since  the  xi  are  unknown,  the  approximate 

variance  estimate  based  on  the  Xt  would  need  to  be  used  to  give  fiducial 

limits  for  a  single  xh. 

(ii)  p  <  q 

We  have  now  more  equations  than  unknowns.  We  have  a  choice 
either  of  omitting  q  —  p  of  the  equations  (provided  we  can  decide  from 
prior  considerations  which  are  least  useful),  or  of  using  the  additional 
information  given  by  the  equations  to  test  the  consistency  of  the  relation- 
ships involving  the  different  dependent  variables.  This  latter  aspect  is 
the  one  that  we  shall  examine. 

If  an  observation  of  a  set  yj(j=  1,  2,  •  •  •,  q)  of  dependent  variables  is  to 
be  used  to  estimate  a  set  xi  (i  =  1,  2,  •  •  •,/?),  we  may  so  determine  the 
estimate  that  it  has  minimum  (estimated)  variance.  Now  since  the 
estimated  covariance  of  yj  and  yk  is  proportional  to  wjk,  the  quantity  to  be 
minimized,  with  respect  to  the  xi9  is 

22  wi\Vi  -  2  baxi)(yk  -  2  bikxz)- 

j   k  i  i 

If  we  put,  as  in  (i), 

Q  =  BW~XB\ 
so  that 

Vm  =  22  wikKbik> 
j  k 

and  also  put 

p  =  BW-iy, 
so  that 

Pi  =  22  w**M» 

j  k 

we  find  for  the  normal  equations 

2X  =  p, 

that  is, 

i  QmXk  =  Pi> 

h 

so  that 

x  ==  e-ip 

or 
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These  results  are  similar  to  those  found  for  the  case  p  =  q,  except  that 
here  the  matrix  B  does  not  possess  an  inverse,  so  that  the  estimates  need  to 
be  expressed  in  terms  of  the  matrices  P  and  Q. 

As  in  the  case  p  =  q,  the  estimated  covariance  matrix  of  the  Xt  is 

H        Q-K 
n-p-1* 

Now  we  may  test  the  consistency  of  the  q  equations  by  means  of  the 
departures  of  the  observed  y,  values  from  the  estimates  provided  by 
inserting  the  X{  in  the  equations.     The  criterion  is 

q-p  H 

which  is  distributed  as  F  with  q  —  p  and  n  —  p  —  q  degrees  of  freedom. 
This  may  be  written  in  the  alternative  forms : 

"(~P~J  (22  ""'*w*  -  HqnAXd 

(q  —  p)ti      j   k  hi 

]~P~J  (22  *'*»  -  22  <ih%Pi). 

{q  —  p)H    j  k  hi 

If  the  value  of  F  is  not  significant,  there  is  no  evidence  for  regarding  the 
equations  as  inconsistent,  and  fiducial  limits  may  be  determined  for  the  xit 
For  these  we  have 

F  =  (H-p-q)  22  qM(X„  -  xh)(Xt  -  xt) 
pti  h  i 

with  p  and  n  —  p  —  q  degrees  of  freedom.  This  may  be  written  in  the 
alternative  form 

F  = "  ~!h~  q  22  ihk'(Pu  -  2  MmXp*  -  2  *&<*)• 

Pti  h  h'  %  i 

By  means  of  this  criterion,  the  concordance  of  any  set  of  xt  with  the  data 
may  be  established. 

In  the  particular  case  when  p  =  1 ,  the  solution  of  the  equations  gives 
the  discriminant  function  for  assigning  a  value  of  xx  on  the  basis  of 
observations  of  the  q  variables  yx,  y2,''',  yr 

The  discriminant  function  is 

*i  =  Pilqn 

22  w%4fc 

=  22^i  A,' 
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To  test  the  consistency  of  any  set  of  observations  yj9  the  criterion  is 

n  —  q  —  i    •  k 


q 


1      (1  +  i  +  r) 

\         n       Uj 


with  q  —  1  and  n  —  q  —  1  degrees  of  freedom. 

It  should  be  remarked  here  that  this  is  a  test  not  of  the  discriminant 
function,  which  has  been  established  from  previous  data,  but  of  the 
consistency  of  the  present  set  of  observations.  A  significant  result  may 
indicate  either  that  the  values  of  yi  are  not  consistent  among  themselves, 
or  that  the  discriminant  function  determined  from  previous  data  does  not 
apply  to  the  present  observations. 

(iii)  p  >  q 

Here  we  have  fewer  equations  than  unknowns,  so  that  estimates  of  the 
unknown  xi  cannot  be  determined.  The  most  that  can  be  done  is  to  find 
a  relationship  among  p  —  q  +  1  of  the  estimates  X4.  In  many  cases  such 
a  relationship  may  be  all  that  is  required.  An  example  of  such  a  relation- 
ship has  already  been  given  in  Example  6.2. 

Suppose  that  we  wish  to  eliminate  Xl9  X2,  •  •  •,  Xa_x  and  to  determine 
the  relationship  among  Xq,  XQ+1,  •  •  •,  XP.  The  determinant  of  the  first 
q  —  1  rows  of  B  and  the  q  —  1  columns  resulting  from  omitting  column/ 
will  be  denoted  by  (—  l)3'-1^-.  Then  it  is  readily  shown  that  the  required 
relationship  is 

i  *,!*„*;  =■!**■ 

j  =  l      i  =  q  j  =  l 

The  fiducial  limits  for  the  corresponding  relationship  among  the  xt  can 
be  determined  only  approximately. 

The  fact  that  p  exceeds  q  does  not,  however,  prevent  simultaneous 
fiducial  limits  for  the  xi  from  being  found.  The  criterion,  distributed  as  F 
with  q  and  n  —  p  —  q  degrees  of  freedom,  from  which  simultaneous 
fiducial  limits  may  be  derived,  is 

n~p~q  21  W'\y,  -  I  b^t)(yk  -  2  bikx,) 

qti  j   k  %  i 

=  "-P-i  22  ?«(**  -  **)(*.  -  **) 

qti  h  i 

where  qhi  is  the  typical  element  of  the  matrix  Q  defined  in  (i)  and  (ii). 
Here  Q,  although  it  is  a  p  X  p  matrix,  is  of  rank  q. 
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9.6    DISCUSSION  OF  THE  CHEMICAL  EXAMPLE 

Example  9.1  (Continued  from  Section  9.2).  The  original  data  of  the 
experiment  discussed  in  Section  9.2  are  given  by  Fisher,  Hansen,  and  Norton 
(1955)  in  their  Table  I  and  are  not  reproduced  here. 

Fisher  et  al.  fitted  quadratic  regression  equations  to  their  data,  but  since  we 
found  that  the  quadratic  terms  were  significant  only  at  the  5  per  cent  level  for 
optical  densities  at  560  m/u  (y2),  we  have  ignored  these  terms  and  fitted  only 
linear  regressions.  The  analyses  of  variance  and  covariance  of  yx  and  y2  are 
shown  in  Table  9.1,  and  the  B,  T,  and  ^matrices  and  their  inverses  in  Table  9.2. 


TABLE  9.1 

Analyses  of  Variance  and  Covariance  of  Optical  Density 

Measurements  at  470  mju  (y-^)  and  at  560  m/^  (y2)  (Fisher, 

Hansen,  and  Norton's  Data) 


Regression  on  xlt  x2 
Residual 

Total 


D.F. 

2 
26 

28 


Sums  of  Squares  and  Products 


2.570  253 
0.003  167 

2.573  420 


ViVz 

4.207  267 
0.002  996 

4.210  263 


6.995  805 
0.006  733 

7.002  538 


TABLE  9.2 

Matrices  of  Sums  of  Squares  and  Products  and  of  Regression 

Coefficients 


1.3465 
4.7276 


[-t 


T 

10«W 

B 

[0.2500   0.0750] 
|_0.0750    0.2500j 

[3167   2996" 
[2996   6733_ 

"1.2166 
2.6240 

T-i 

W-i 

3956    -1.3187] 
3187     4.3956  J 

[  545.3    -242 
1  -242.6     256 

.6 

.5 

2.1311 
-1.1829 

B'1 


-0.6070 
0.5484 


Thus  we  see  from  Table  9.2  that  the  direct  regression  equations  are 
Y1=  1.2166a?!  +  2.6240*2 
Y2  =  1.3465^  +4.7276*2 
and  the  inverse  equations  are 

Xx  =      2.1311?/!  -  1-18292/2 
X2  =  -0.6070i/i  +  0.5484j/2 
in  agreement  with  the  results  of  Fisher  et  al. 
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The  direct  equations  are  less  useful  than  the  inverse  ones.  Since,  in  this 
example,  the  numbers  of  dependent  and  independent  variables  are  equal,  no 
test  for  consistency  is  possible,  but  we  can  derive  fiducial  limits  for  the  values 
of  xx  and  x2  corresponding  to  observed  values  yx  and  y2. 

Since  the  inverse  regression  coefficients,  as  well  as  the  direct  coefficients, 
are  likely  to  be  well  determined,  we  may  calculate  their  approximate  standard 
errors.     Table  9.3  gives  the  matrices  B~XT~XB'~X  and  B'^WB'1  required  in 


TABLE  9.3 
Matrix  Products  Required  in  Estimating  Variances 
m-i  =  B-iT-iB>-\  lO6^-1  =  B'^WB'1) 

24.995         -15.032]         [~    8699         -2812" 
-15.032  9.183  -2812  1197 


these  calculations.     Then,  for  example,  the  variance  of  b21  is  obtained  using  the 
second  diagonal  term  of  B-XT~YB'-X  and  the  first  diagonal  term  of  B'^WB'1: 

9.183  x  10-6  x  8699/26  =0.003  072, 

so  that  the  standard  error  of  b21  is  0.055.    The  standard  errors  of  the  coefficients 
may  be  set  out  as  follows : 

"0.091     0.034" 

.0.055    0.021_ 

For  general  purposes,  of  course,  the  covariances  as  well  as  the  variances  will 
be  of  interest. 

In  determining  the  approximate  variance  of  an  estimate  Xi9  since  the  regression 
is  through  the  origin  rather  than  the  point  of  means,  the  actual  values  of  the 
Vj  rather  than  departures  from  means  are  used,  and  the  term  \\n  is  omitted  from 
the  variance  estimates,  in  equation  (9.10). 

Thus,  approximately, 

in-6  v  8699 
V{X±)  = (1  +  24.99ft8  -  30.06fty2  +  9.182/22) 


26 
10-6  x  8699 


26 
with  similar  results  for  Cov  (Xlf  X2)  and  V(X2). 


(1  +  4.396AV2  -  2.637X^2  +  4.396X22) 


CHAPTER     10 


Discriminant  Functions 


10.1    INTRODUCTION 

The  individuals  forming  a  sample  can  often  be  classified  into  two  or 
more  groups.  It  is  then  of  interest  to  determine  which  of  the  characteristics 
of  these  individuals  enable  the  distinctions  between  groups  to  be  most 
clearly  made.  If  several  such  characteristics  can  be  measured  on  each  of 
the  individuals,  it  may  be  found  that  some  linear  function  of  these  variables 
is  more  efficacious  in  distinguishing  the  groups  than  is  any  one  of  them. 
Thus,  for  example,  certain  counts  on  fish  are  known  as  meristic  counts; 
the  sum  of  these  counts  for  any  individual  is  called  the  meristic  index  and 
is  used  as  the  basis  for  distinguishing  different  races  of  fish.  Likewise,  in 
anthropometry,  various  skull  measurements  have  in  the  past  been 
combined  to  give  what  has  been  called  a  "coefficient  of  racial  likeness.' ' 

Any  method  of  combining  different  variables  is  to  some  extent  arbitrary, 
so  that  the  choice  of  the  linear  combination  is  usually  based  on  some 
criterion  considered  to  be  appropriate.  The  most  satisfactory  feature  of 
the  indexes  mentioned  is  their  simplicity,  although  they  have  probably 
little  else  to  recommend  them.  Sometimes  the  manner  of  combining 
variables  will  be  dictated  by  the  conditions  of  the  problem.  Thus,  if 
wheat  and  straw  yields  resulting  from  different  treatments  have  been 
determined,  the  total  value  of  the  crop  for  each  treatment  is  likely  to  be  of 
interest;   then  each  yield  may  be  weighted  by  its  current  price. 

When  the  purpose  of  the  analysis  is  to  determine  a  linear  function  of  the 
variables  that  distinguishes  most  clearly  among  the  several  groups,  the 
linear  function  is  known  as  a  discriminant  function.  It  is  estimated  from 
the  data  as  that  function  for  which  the  ratio  of  the  sum  of  squares  between 
groups  to  the  residual  sum  of  squares  is  a  maximum.  The  idea  of  using  a 
discriminant  function  was  first  applied  by  Barnard  (1935)  to  measure- 
ments of  Egyptian  skulls,  of  known  dynastic  period,  the  object  being  to 
classify  other  skulls,  of  unknown  age,  with  minimum  chance  of  error.  It 
has  since  been  applied  in  many  other  investigations  in  which  multiple 
measurements  can  be  made. 
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In  the  present  chapter,  the  uses  of  discriminant  functions,  and  the 
methods  of  determining  them,  will  be  outlined.  Tests  for  the  adequacy 
of  an  assigned  discriminant  function  (i.e.,  for  some  given  system  of 
weighting  the  variables)  to  account  for 'the  relation  existing  between  the 
variables  and  the  group  effects  will  be  given.  The  theory  of  discriminant 
analysis  that  is  relevant  to  practical  needs  is  an  extension  of  multiple 
regression,  and  most  of  the  required  significance  tests  can  be  carried  out 
by  means  of  the  analysis  of  variance.  Thus,  as  will  be  shown  later, 
although  the  general  theory  of  multivariate  analysis  is  very  complicated 
mathematically,  the  theory  required  for  significance  tests  and  estimation  is 
not  beyond  the  scope  of  this  book. 

10.2    RELATIONSHIP  OF  DISCRIMINANT  FUNCTION 

FOR  A  SINGLE  COMPARISON  WITH  MULTIPLE 

REGRESSION 

Multiple  regression  relations  may  be  validly  determined  by  the  methods 
described  in  earlier  chapters,  provided  the  values  of  the  independent 
variable  y  are  normally  distributed  about  the  regression  function ;  there  is 
then  no  restriction  on  the  distribution  of  the  xi9  which  may  even  be 
formally  defined  variables,  representing  differences  between  groups  within 
the  population.  It  can  be  shown  that  the  theory  likewise  applies  when  y  is 
only  a  formally  defined  variable  (representing  a  difference  between  groups 
or  treatments,  for  example),  provided  the  x  values  then  have  normal 
distributions,  with  the  same  covariance  matrix,  within  each  group.  A 
formal  multiple  regression  of  y  on  the  xi  can  be  calculated  and  the  standard 
errors  of  the  coefficients  determined  in  the  usual  way.  The  linear 
combination  of  the  xi  which  makes  up  the  regression  function  is  known  in 
this  case  as  a  discriminant  function,  since  it  serves  to  discriminate  between 
the  groups  that  the  values  of  y  represent,  and  in  fact  maximizes  the  ratio 
of  the  between-groups  (i.e.,  regression)  sum  of  squares  to  the  total  sum  of 
squares. 

When  there  are  more  than  two  groups  to  be  distinguished,  the  analysis 
as  a  formal  multiple  regression  can  still  be  applied,  provided  only  one  of 
the  comparisons  between  groups  is  relevant.  For  example,  Barnardjlg55)  ' 
considered  the  linear  regression  of  each  of  the  variables  on  time  as  the 
only  relevant  comparison.  .  Ijijhe  same  way,  Day  and  Sandomire  (1942)P~\ 
in  predicting  the  age  of  white  tail  deer,  determined  a  discriminant  function 
for  the  linear  regression  of  various  characters  on  a  measure  of  age.  In 
many  practical  problems  the  conditions  of  the  experiment  will  define  a 
single  comparison  which  is  relevant. 
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10.3    SCALE  CONVENTION 

As  just  described,  the  determination  of  a  discriminant  function  in  no 
way  differs  from  that  of  a  multiple  regression  function.  The  scale  of  the 
discriminant  function  is,  however,  arbitrary,  since  the  values  given  to  y  to 
represent  the  two  groups  are  arbitrary;  in  other  words,  the  discriminant 
ratio  is  unaffected  if  all  the  coefficients  in  the  linear  function  are  increased 
in  the  same  proportion. 

It  is  convenient,  both  for  the  mathematical  analysis  and  for  numerical 
computations,  so  to  choose  the  scale  of  the  discriminant  function  that  the 
total  sum  of  squares  (or,  in  general,  the  sum  of  the  between-groups  and 
residual  sums  of  squares)  shall  be  unity.  The  advantage  of  this  convention 
is  that,  when  additional  variables  are  included  in  the  discriminant  function, 
the  reduction  in  the  residual  sum  of  squares  correctly  reflects  the  improve- 
ment in  discrimination  attributable  to  the  new  variables  and  can 
immediately  be  tested  for  significance. 

This  convention  is  mentioned  here  because  some  workers  adopt  the 
convention  of  keeping  the  residual  sum  of  squares  constant.  It  should  be 
noted  that  any  such  convention  is  arbitrary  and  is  adopted  only  for 
convenience, 

10.4    CALCULATION  OF  DISCRIMINANT  FUNCTION 
FOR  TWO  GROUPS  (OR  ANY  SINGLE  COMPARISON) 

When  the  comparison  of  interest  is  the  difference  of  two  groups,  the 
difference  of  group  means  for  each  of  the  x{  (denoted  by  dt)  is  proportional 
to  the  formal  sum  of  products  of  y  and  xt;  thus  in  the  usual  normal 
equations  of  multiple  regression  the  pt  may  be  replaced  by  dt.  If  the 
sizes  of  sample  from  the  two  groups  are  «x  and  n2,  the  coefficient  of 
proportionality,  to  make  the  total  sum  of  squares  of  y  unity,  is  k,  where 

k*=     ni"2 


"i  +  n2 

We  accordingly  have,  with  the  usual  notation,  fhi  denoting  the  total  sum 
of  products  of  xh  and  xi9 

2  Khi  =  kdi 

h 

i 

Then  the  discriminant  function  is 
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The  analysis  of  variance  of  y  is  set  out  in  Table  10.1  where  n  =  nx  + 


"2  -  1- 


TABLE  10.1 

D.F.  Sum  of  Squares 

Regression  on  xv  x2,---,xp  p  £  2  b^  =  22  hbih 

i  hi 

Residual  n  -  p  1  —  k  2  M< 


Total 


An  equivalent  result  is  given  by  the  analysis  of  variance  of  Y  between 
and  within  groups,  as  in  Table  10.2. 


TABLE  10.2 
D.F.  Sum  of  Squares 

Between  groups  p  &2(2  W<)2 

i 

Within  groups  n  —  p        k  2  M*(l  —  k  2  Mi) 


Total  «  22**Vm  -  &2V* 


A    i 


There  are  several  questions  that  can  be  answered  from  this  analysis 
(apart  from  the  over-all  significance  of  discrimination,  which  is  given  by 
the  analysis  directly).  We  may  first  of  all  be  interested  in  the  significance 
of  one  coefficient  or  a  set  of  them,  indicating  whether  the  corresponding 
variables  contribute  significantly  to  the  discrimination.  The  sum  of 
squares  for  the  coefficient  bh  is  simply 

which  may  be  tested  in  the  usual  way  against  the  mean  square  from  the 
first  analysis.  For  computing  purposes  this  sum  of  squares  may  be  more 
conveniently  written  as 

&2(2  th%)2ithh. 

i 

Alternatively,  the  variance  of  bh  is  found  by  multiplying  the  residual  mean 
square  by  thh. 
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Another  aspect  that  will  almost  certainly  need  to  be  tested  is  the  signifi- 
cance of  departure  of  the  coefficients  from  any  assigned  set  of  coefficients 
Pi  (this  is  a  generalization  of  the  over-all  test  given  by  the  analysis  of 
variance).  In  particular,  by  putting  some  of  the  pt  zero,  we  can  test  the 
significance  of  the  corresponding  variables.  Since  the  scale  of  the  given 
coefficients  may  be  arbitrary,  it  needs  to  be  adjusted  to  make  the  given 
coefficients  comparable  with  the  estimated  ones.  It  can  be  shown  that 
the  scale  convention  is  such  that 

hi  i 

With  this  convention,  the  sum  of  squares  between  groups  for  the  hypo- 
thetical discriminator 

n  =  2  Pixi 

i 

is 

i 

with  one  degree  of  freedom.    The  sum  of  squares  for  departure  from  the 
hypothetical  discriminator  is  then 

kl(bt-^di 

i 

with/?  —  1  degrees  of  freedom.     We  thus  have  the  analysis  in  Table  10.3. 

TABLE  10.3 

D.F.  Sum  of  Squares 

Hypothetical  discriminator  1  k  J  /fy/t- 

i 

Departure  from  hypothetical  discriminator        p  —  1  k  J  (b%  —  Pddi 

i 

Residual  n  —  p  1  —  k  2  W* 


Total 


Example  10.1  Effect  of  Milking  Treatment  on  Lactation  Rate  of 
Merino  Ewes.  To  test  the  effect  of  a  milking  treatment  on  the  lactation 
rate  of  ewes,  twelve  ewes  were  chosen  and  allotted  at  random  to  three  groups  of 
four.  At  any  stage  of  the  experiment  those  of  one  group  were  left  untreated 
(treatment  O),  those  of  another  had  the  treatment  applied  to  the  left  half  of  the 
udder  (treatment  L),  and  the  remaining  group  had  the  treatment  applied  to  the 
right  half  (treatment  R).  The  treatments  were  applied  at  three  different  stages 
of  lactation,  two  weeks,  six  weeks,  and  twelve  weeks,  the  arrangement  forming 
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four  Latin  squares,  so  that  each  ewe  received  each  treatment  at  one  or  another 
stage  of  lactation.  The  treatment  layout  is  shown  in  Table  10.4,  together  with 
lactation  rates  (ml./hr.)  determined  for  each  half  and  denoted  by  /  and  r. 


TABLE  10.4 
Lactation  Rates  (ml./hr.)  of  Ewes  under  Different  Treatments 


Stage  of  Lactation 


No. 

2  weeks 

6  weeks 

12  weeks 

L 

R 

O 

1 

40.3 

O 

23.2 

17.4 

L 

15.0 

7.4 

6.8 
R 

2 

26.6 

R 

26.6 

16.9 

O 

14.0 

6.1 

5.8 
L 

3 

29.5 

L 

26.7 

16.2 

R 

14.6 

9.9 

8.1 
O 

4 

19.9 

O 

14.2 

16.1 

L 

11.6 

11.0 

7.0 
R 

5 

18.4 

R 

23.4 

8.3 

O 

14.3 

2.3 

11.2 
L 

6 

23.1 

R 

26.3 

13.3 

L 

13.4 

8.7 

7.8 
O 

7 

21.8 

O 

29.5 

13.6 

R 

13.8 

9.8 

9.4 
L 

8 

24.2 

L 

20.3 

16.5 

O 

14.3 

8.5 

6.9 
R 

9 

17.3 

R 

15.7 

13.8 

L 

18.8 

5.1 

8.1 
O 

10 

17.7 

O 

17.7 

14.3 

R 

12.1 

8.4 

8.3 
L 

11 

19.4 

L 

17.8 

11.7 

O 

11.3 

5.7 

6.1 
R 

12 

42.3 

37.5 

18.1 

19.3 

11.4 

11.8 

L  =  treatment  applied  to  left  half 

R  =  treatment  applied  to  right  half 

O  =  no  treatment  applied 

The  left  and  right  figures  are  lactation  rates  for  left  and  right  halves  respectively. 


In  the  analysis,  the  treatment  effects  are  separated  into  two  orthogonal 
comparisons : 

(i)  average  difference  between  treated  and  control,  \{L  +  R)  —  O,  and 
(ii)  difference  between  treatment  of  left  half  and  treatment  of  right  half,  L  —  R. 
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The  analysis  of  variance  and  covariance  of  the  two  measurements  is  shown  in 
Table  10.5.  For  simplicity,  only  as  much  of  the  analysis  as  is  required  for  the 
present  example  is  given  here.  It  is  to  be  expected  that  the  effect  (i)  will  be  the 
same  for  both  halves,  and  that  it  can  be  efficiently  estimated  from  the  sum  /  +  r; 
on  the  other  hand,  the  effect  (ii)  for  one  half  will  be  opposite  to  that  for  the 
other  half  and  may  be  estimated  from  the  difference  I  —  r.  The  analyses  of  the 
quantities  /  +  r  and  I  —  r  are  therefore  also  shown  in  Table  10.5. 

The  analysis  shows  that  effect  (i),  namely  the  average  difference  between 
treatments  and  control,  is  negligible;  this  would  imply  that,  if  any  effect  of 
treatment  exists,  it  is  to  divert  milk  from  one  half  to  the  other  without  altering 
its  total  amount.  This  is  confirmed  by  the  analysis  of  /  —  r,  which  shows  that 
treatment  of  the  left  half  increases  the  rate  of  lactation  for  the  left  half  at  the 
expense  of  the  right,  and  vice  versa.    The  averages  are 

Treated  half  32.92  ml./hr. 

Untreated  half  29.37  ml./hr. 

Difference  3.55  ml./hr. 

These  results  complete  the  main  part  of  the  analysis  of  the  data.  It  is, 
however,  of  some  interest  to  test  whether  there  is  any  asymmetry  between  the 
halves,  which  would  be  indicated  by  some  combination  other  than  I  —  r  showing 
up  more  significantly  the  effect  L  —  R.  The  relevant  test  is  that  of  the  hypo- 
thetical discriminator  I  —  r. 

From  the  sums  of  squares  in  Table  10.5  and  the  differences  shown  at  the  foot 
we  may  write  down  the  normal  equations  for  the  coefficients  of  /  and  r  as  follows: 

406.166,  +  212.206,  =      2.25A: 

212.20&J  +  267.436,.  =  -1.30A: 

where  k  =  V6  since  nx  =  nr  =  12. 
The  solutions  are 

bx  =      0.013  800  4k 

br  =  -0.015  811  4k, 

TABLE  10.5 

Sums  of  Squares  and  Products  of  Results  for  Left  and  Right 

Halves;   and  Sums  of  Squares  for  Total  and  Difference 

D.F.  P  lr  r2  1+  r  l-r 


Sheep 

11 

539.74 

232.20 

321.07 

1325.21 

396.41 

Stages 

2 

1796.57 

1578.62 

1387.63 

6341.44 

26.96 

Treatments  L  +  R  —  2(0) 

1 

1.74 

-1.31 

0.98 

0.10 

5.34 

L-R 

1 

30.38 

-17.55 

10.14 

5.42 

75.62** 

Residual 

20 

375.78 

229.75 

257.29 

1092.57 

173.57 

Residual  +  (L  -  R) 

21 

35 

406.16 

212.20 

267.43 

— 

249.19 

Total 

2744.21 

2021.71 

1977.11 

8764.74 

677.90 

Residual  mean  square 

54.63 

8.678 

Total  of  results  L  —  R 

27.0 

-15.6 

11.4 

42.6 

Mean  of  results  L  —  R 

2.25 

-1.30 

Significant  at  1  per  cent  level 
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so  that  the  sum  of  squares  for  the  treatment  effect  (ii)  is 

6(0.013  800  4  x  2.25  +  0.015  811  4  x  1.30)  =  0.3096. 

Now  for  the  hypothetical  discriminator  the  corresponding  treatment  sum  of 
squares  is 

75.62/249.19  =  0.3035, 

so  that  the  analysis  of  variance  is  as  shown  in  Table  10.6.    It  is  seen  that,  as 


TABLE  10.6 
Analysis  to  Test  the  Discriminant  Function 


L  —  R,  Based  on  /  —  r 

Additional  due  to  discriminant  function 


Residual 


Residual  +  (L  -  R) 
(n)  Not  significant 


.F. 

Sum  of 
Squares 

Mean 
Square 

1 
1 

0.3035 
0.0061 

0.0061  (« 

2 
19 

0.3096 
0.6904 

0.03634 

21 


1.0000 


might  be  expected,  the  adjustment  of  the  ratio  bx  :  br  from  the  theoretical  value 
—  1  to  the  value  derived  from  the  data  has  not  had  any  significant  effect  on  the 
discriminant  ratio.  The  calculations  show  how  the  existence  of  any  asymmetry 
may  be  tested  and  its  magnitude  assessed. 


10.5    DISCRIMINANT  FUNCTION  FOR  A  REGRESSION 
RELATIONSHIP 

Before  considering  the  general  case  of  discriminant  analysis  when  there 
are  more  than  two  comparisons  of  interest  among  the  groups  to  be 
compared,  we  now  consider  the  case  in  which  the  comparison  of  interest  is 
the  regression  of  the  variable  on  some  known  variable.  For  example, 
Finney  (1952),  describing  possible  applications  of  multiple  measurements 
to  biological  assay,  points  out  that  the  linear  regression  of  each  measure- 
ment on  the  dosage  is  the  relevant  comparison,  and  that  a  combination  of 
several  measurements  should  be  used  only  as  it  materially  increases  the 
precision  of  this  regression.  Again,  in  Barnard's  (1935)  investigation  of 
Egyptian  skulls,  the  variation  of  different  characters  with  time  was  being 
studied,  the  object  being  to  date  subsequently  found  skulls  of  unknown 
origin.     Hence,  the  group  comparison  considered  was  the  regression  of 
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each  character  on  time.  The  discriminant  function  chosen  was  that 
combination  of  characters  most  highly  correlated  with  time ;  and,  since 
the  variation  between  groups  was  represented  by  a  single  comparison,  the 
discriminant  function  could  be  determined  by  the  general  methods  given 
in  Section  10.4.  The  discriminant  function  in  this  case  is  actually 
equivalent  to  the  multiple  regression  function  of  time  on  the  skull  measure- 
ments. 

Further  analyses  of  these  data  have  already  been  made  by  Bartlett 
(1947)  and  Rao  (1952).  The  analysis  which  follows  differs  somewhat  from 
each  of  these,  although  we  have  used  some  of  Rao's  results. 

Example  10.2  Discriminant  Function  for  Determining  Age  of 
Skulls.     In  Table  10.7  are  shown,  for  four  series  of  Egyptian  skulls,  the 

TABLE  10.7 

Means  of  Four  Characters  in  Four  Series  of  Skulls 

(From  Rao,  Table  Id.  5a) 

Series       TV  x,  x9  x»  x. 


I 

91 

133.582  418 

98.307  692 

50.835  165 

133.000  000 

II 

162 

134.265  432 

96.462  963 

51.148  148 

134.882  716 

III 

70 

134.731  429 

96.857  143 

50.100  000 

133.642  857 

IV 

75 

135.306  667 

95.040  000 

52.093  333 

131.466  667 

means  of  the  following  four  measurements:    basialveolar  height  (a^),  nasal 
height  (%),  maximum  breadth  0%),  and  basibregmatic  height  (#4).    Table  10.8 

TABLE  10.8 

Total  Sums  of  Squares  and  Products  of  the  Skull 

Measurements  and  the  Time  Variable 

(397  degrees  of  freedom) 

(cf.  Rao,  Table  Id.  5/5) 


9785.18 

214.20 

1217.93 

2019.82 


214.20 
9559.46 
1131.72 
2381.13 


"3 

1217.93 
1131.72 
4088.73 
1133.47 


2019.82 
2381.13 
1133.47 
9382.24 


718.76 

-1407.26 

410.10 

-733.43 
4307.67 


shows  the  over-all  sums  of  squares  and  products  of  the  four  variables.  Since 
the  time  intervals  between  the  four  series  are  assumed  to  be  in  the  ratios  2:1  :  2, 
the  time  variate  is  given  the  values  —5,  —1,  1,  5  for  the  four  series.    The  sum 
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of  products  of  each  measurement  with  time  as  thus  denned  and  the  sum  of 
squares  of  time  are  also  shown  in  Table  10.8.  From  these  the  multiple  regression 
of  time  on  the  measurements  is  determined.  Table  10.9  gives  the  inverse 
matrix  and  the  partial  regression  (or  discriminant)  coefficients,  and  Table  10.10 


TABLE  10.9 

Inverse  Matrix  for  the  x  Variables;  Regression  Coefficients 

and  Standard  Errors 

b  S.E 


10~6  x 


110.119  6.355  -28.496  -21.877" 

6.355  114.329  -25.985  -27.244 

-28.496  -25.985  265.622  -19.360 

-21.877  -27.244  -19.360  120.547 


0.074  565  ±  0.0332 

•0.146  998  ±0.0338 

0.139  217  ±0.0516 

-0.073  737  ±  0.0347 


TABLE  10.10 
Analysis  of  Variance  of  the  Time  Variate  / 

D.F 


Regression 
Residual 

Total 


4 
393 

397 


Sum  of  Squares     Mean  Square 
10.02 


371.63 
3936.04 


4307.67 


gives  the  analysis  of  variance,  from  whose  error  mean  square  the  standard 
errors  of  the  coefficients  are  calculated.  Although  this  analysis  is  formal  only, 
in  that  the  appropriate  regression  is  that  of  the  measurements  on  time,  it  never- 
theless provides  valid  standard  errors  and  significance  tests,  as  explained  earlier. 
The  analysis  shows  that  all  the  variables  contribute  significantly  to  the 
regression,  the  contributions  of  x1  and  xi  being  significant  at  the  5  per  cent 
level,  the  other  two  at  the  1  per  cent  level.  If  the  regression  of  the  population 
discriminator  on  time  is  assumed  to  be  linear,  simultaneous  fiducial  limits  for 
the  coefficients  0lf  /52,  /S3,  and  /54  may  be  determined  as  the  sets  of  values  that 
make  significant  the  sum  of  squares  for  departure  of  the  regression  coefficients 
from  their  theoretical  values,  as  described  in  Chapter  3. 


10.6    TEST  OF  AN  ASSIGNED  DISCRIMINANT  FUNCTION 

In  the  previous  section  we  have  considered  the  analysis  in  which, 
although  there  are  more  than  two  groups,  the  discriminant  function  is 
determined  with  respect  to  one  specified  comparison  between  groups. 
We  now  consider  the  converse  of  this,  where  the  discriminator  is  a  specified 
linear  combination  of  the  independent  variables.  The  analysis  of  variance 
of  such  a  discriminator  provides  the  needed  test  for  the  significance  of  the 
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group  differences  which  it  is  supposed  to  reveal,  and  it  can  be  seen  that 
such  an  analysis  is  of  the  same  form  as  the  analysis  in  which  a  group 
comparison  is  specified.  We  may,  however,  take  the  analysis  further  and 
test  also  the  adequacy  of  the  specified  discriminator,  taking  into  account 
the  variation  between  groups  of  the  original  independent  variables.  A 
corresponding  analysis  would  have  been  possible  with  the  example  given 
in  the  last  section,  wherein  we  could  have  tested  the  linearity  of  the 
regression  of  time  on  the  measurement  variables,  or,  in  other  words,  the 
adequacy  of  time  to  represent  the  variation  between  groups.  This 
analysis  will  in  fact  be  made  in  the  next  section,  but  since  there  are  more 
than  two  comparisons  between  groups,  the  analysis  is  a  little  more 
complicated  than  that  in  which  there  are  only  two  comparisons  (or 
correspondingly,  two  independent  variables).  We  here  consider  the  test 
for  the  adequacy  of  a  discriminator  based  on  two  independent  variables. 
In  such  a  case,  the  effect  of  the  given  discriminator  may  be  eliminated  by 
covariance,  and  the  adjusted  analysis  provides  the  test  of  its  adequacy. 

In  the  general  case  with  more  than  two  groups,  when  there  are  two  or 
more  group  comparisons,  the  analysis  of  the  data  and  the  tests  that  are 
required  are  more  complicated.  The  linear  function  that  best  discrimi- 
nates one  comparison  is  not  generally  best  for  another  comparison;  in 
other  words,  a  single  discriminator  is  not  usually  adequate  to  specify  all 
the  differences  among  the  groups.  Such  a  specification  is  adequate  only 
when  the  different  groups  or  populations  are  collinear,  which  means  that 
the  changes  in  the  mean  values  for  the  different  variables  from  one 
population  to  another  are  proportional. 

Thus  it  will  be  seen  that,  with  more  than  two  groups,  the  test  for  the 
adequacy  of  a  given  discriminator  has  two  aspects:  the  test  for  the 
assigned  coefficients,  which  is  similar  to  the  test  given  when  there  are  two 
groups,  and  may  be  called  the  test  for  direction,  and  the  even  more  crucial 
test  for  the  collinearity  of  the  groups,  which  is  actually  a  test  of  whether 
any  single  discriminator  is  adequate  to  specify  group  differences.  If  the 
test  shows  significant  departure  in  direction,  a  discriminant  function  with 
different  coefficients  may  still  be  tried ;  but  if  the  test  shows  significant 
departure  of  the  data  from  collinearity,  no  single  discriminant  function 
can  adequately  specify  the  differences  among  the  populations.  In 
practice,  of  course,  even  if  the  groups  are  not  collinear,  it  may  still  be 
convenient  to  use  a  single  linear  function  if  it  discriminates  satisfactorily. 

Example  10.3  Discrimination  between  Hybrid  Strains  of  Euca- 
lyptus. Between  two  widely  separated  stands  of  Eucalyptus  trees,  one 
considered  to  be  E.  maculosa  and  the  other  E.  elaeophora,  was  a  region  containing 
trees  believed  to  be  hybrids  of  these  two  species.  The  presumed  hybrid  trees 
were  classified  on  the  basis  of  botanical  characteristics  into  groups  believed  to 
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be  of  approximately  the  same  genetical  composition.  Specimens  of  wood 
were  taken  from  a  number  of  trees,  both  of  the  parental  types  and  the  hybrids, 
and  determinations  of  density  and  maximum  compressive  strength  were  made 
on  each.  The  purpose  of  the  investigation  was  to  determine  whether  either 
density  or  maximum  compressive  strength,  or  some  combination  of  them, 
satisfactorily  discriminated  the  different  botanical  types. 

The  relevant  data  from  this  investigation  are  set  out  in  Table  10.11.    In  this 

TABLE  10.11 

Mean  Density  and  Maximum  Compressive  Strength  for  Parental 
and  Hybrid  Groups,  of  E.  maculosa  and  E.  elaeophora 

Botanical  Classification,     Number  of  Maximum  Compressive 

percentage  E.  maculosa  Trees  Density,  xx  Strength,  x2 


0 

10 

43.61 

5500 

10-30 

3 

45.33 

5550 

40-60 

3 

39.67 

4670 

70-90 

5 

40.58 

4910 

100 

10 

37.01 

4569 

Total  31  40.78  5029 

example  there  are  five  groups  (the  parental  types  and  three  presumed  hybrid 
groups).  As  the  botanical  classification  is  not  definite  enough  to  provide  a 
variable  with  which  to  correlate  the  properties  of  the  wood,  the  dispersion  is 
analyzed  between  and  within  groups.  The  sums  of  squares  and  products  of 
density  (x^  and  maximum  compressive  strength  (x2)  are  shown  in  Table  10.12. 

TABLE  10.12 

Sums  of  Squares  and  Products  of  Density  xx  and  Maximum 

Compressive  Strength  x2 


D.F. 

*i2 

¥2 

xi 

Between  classes 
Within  classes 

4 
26 

30 

288.33 
191.06 

39,106 

25,731 

64,837 

5,606,200 
7,126,100 

Total 

479.39 

12,732,300 

Fraction  of  sum  of  squares 
between  classes 

0.60 

0.44 

It  is  seen  that,  whereas  60  per  cent  of  the  variation  in  density  is  between  groups, 
there  is  only  44  per  cent  between  groups  for  compressive  strength.  Taking 
density  as  the  specified  discriminator,  we  need  to  test  whether  compressive 
strength  adds  anything  to  the  discrimination.  The  test  is  carried  out  by  means 
of  an  analysis  of  covariance  as  described  in  Chapter  7.  After  adjustment  for 
density,  the  variation  of  compressive  strength  between  groups  is  tested.    The 
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adjusted  sum  of  squares  between  groups  breaks  up  into  two  parts,  the  reduced 
sum  of  squares  and  a  sum  of  squares  for  difference  of  regressions  within  groups 
and  between  groups.  The  reduced  sum  of  squares  gives  evidence  whether  a 
single  discriminator  is  adequate  for  specifying  differences  among  the  groups; 
the  "difference  of  regressions"  term  indicates  whether  the  chosen  discriminator 
could  be  significantly  improved  and  hence  provides  the  test  of  direction  of  the 
discriminator.  The  analysis,  which  is  set  out  in  Table  10.13,  shows  that  density 
is  a  satisfactory  discriminator  for  these  data. 

TABLE  10.13 

Analysis  of  Variance  of  Maximum  Compressive  Strength  xz, 

Adjusted  for  Density  xx 

D.F.     Sum  of  Squares    Mean  Square 

Difference  of  regressions  (Direction)  1  100  100("> 

Between  classes,  reduced  (Collinearity)         3  302,300  100,800<n> 

Within  classes,  reduced  25  3,660,800  146,400 

Total,  reduced  29  3,963,200 

(n)  Not  significant 

10.7    COMPREHENSIVE  ANALYSIS  WITH  MORE  THAN 

TWO  VARIABLES  AND  MORE  THAN  TWO 

COMPARISONS  BETWEEN  GROUPS 

In  general,  when  p  and  q  (the  numbers  of  x  and  y  variables  (or  group 
comparisons)  respectively)  both  exceed  two,  the  analysis  required  is  a 
generalization  of  the  analysis  of  covariance.  We  need  still  to  test  two 
aspects  of  the  data — the  collinearity  of  the  groups  and  the  adequacy  of  the 
specified  discriminator — but,  since  there  are  more  than  two  variables  in 
either  set,  the  elimination  of  the  discriminator  by  covariance  leaves  more 
than  one  adjusted  variable.  The  test  for  collinearity  has  therefore  to  be 
based  on  determinants  of  sums  of  squares  and  products,  which  take  the 
place  of  the  reduced  sums  of  squares  appearing  in  the  earlier  examples. 

Besides  these  detailed  tests,  it  is  also  sometimes  useful  to  make  an 
over-all  test  of  the  adequacy  of  the  given  discriminant  function.  Such  an 
over-all  test  has  been  described  by  Bartlett,  Rao,  and  others.  Although 
this  test  is  not  of  such  practical  interest  or  so  readily  interpretable  as  the 
detailed  tests,  it  is  given  below  for  completeness,  and  because  it  gives  the 
simplest  introduction  to  the  methods  of  test. 

To  exemplify  the  general  method  we  apply  it  to  the  further  analysis  of 
Barnard's  data  on  Egyptian  skulls. 

Example  10.4  Further  Examination  of  Data  Presented  in  Example 
10.2.    In  the  treatment  given  in  Example  10.2.  no  cognizance  was  taken  of 
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there  being  more  than  two  series  of  skulls,  only  the  regression  of  the  characters 
on  time  being  considered.  We  now  consider  the  adequacy  of  the  given  function 
(i.e.,  time)  to  express  the  differences  between  series.  Even  if  the  regression  of 
time  on  the  skull  measurements  is  significant,  there  are  still  the  questions  (i) 
whether  any  other  variable  representing  a  comparison  among  series  has  higher 
correlation  with  the  skull  measurements,  and  (ii)  whether  the  different  skull 
measurements  conform  with  this  discriminator,  or  whether  more  than  one  such 
comparison  between  series  is  needed  to  take  into  account  the  variation  among 
the  measurements.  Tests  of  these  questions  will  be  called  tests  of  direction  and 
collinearity  respectively. 

It  should  be  noted  here  that,  whereas  the  hypothetical  discriminator  is  usually 
a  function  of  the  independent  variables  (in  particular,  it  may  be  one  of  them), 
in  this  example  the  "hypothetical  discriminator"  is  a  function  of  the  group 
comparisons  (i.e.,  time).  However,  on  account  of  the  duality  mentioned  in 
Section  10.2,  this  reversal  of  the  roles  of  dependent  and  independent  variables 
does  not  affect  the  analysis. 

(i)  The  over-all  test  of  time  as  a  discriminator 

The  over-all  test  is  a  test  of  whether  the  simultaneous  variation  of 
xn  #2>  x3>  and  xi  is  linearly  related  to  time,  or  whether  there  is  significant 
departure  from  linearity.  The  test  may  be  developed  in  the  following  way. 
Each  of  the  sums  of  squares  and  products  of  the  four  variables  may  be 
partitioned  into  parts  attributable  to 

Degrees  of  Freedom 

Regression  on  time  1 

Deviation  from  regression  2 

Residual  394 

Residual  plus  deviation  396 


Total  397 

The  relevant  sums  of  squares  and  products  are  shown  in  Tables  10.14  and 
10.15.    Then,  on  the  null  hypothesis,  the  sums  of  squares  and  products  for 

TABLE  10.14 

Sums  of  Squares  and  Products  for  Deviation  from  Regression  on  Time 

(Residual  +  deviation,  396  degrees  of  freedom) 

(From  Rao,  Table  Id.  5e) 

%l  x2  x3  X4  * 

"9665.25  449.01  1149.50  2142.20" 

449.01  9099.73  1265.69  2141.52 

1149.50  1265.69  4049.70  1203.30 

2142.20  2141.52  1203.30  9257.37 


371.63 


Determinant  =  2699.60  x  1012 
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(deviation  +  residual),  with  396  degrees  of  freedom,  will  be  free  of  group 
differences  after  each  variable  has  been  adjusted  for  possible  effects  of  each  other 
one.  This  requirement  may  be  expressed  in  the  fact  that  the  determinant  of 
these  sums  of  squares  and  products  will  be  independent  of  group  differences. 

TABLE  10.15 

Sums  of  Squares  and  Products  for  Residual 

(394  degrees  of  freedom) 

(From  Rao,  Table  Id.  50) 


'9662.00 

445.57 
1130.62 

2148.58 


445.57 
9073.12 
1239.22 
2255.81 


1130.62 
1239.22 
3938.32 
1271.05 


X4 

2148.58" 
2255.81 
1271.05 
8741.51 


335.80 


Determinant  =  2426.91  x  1012 


This  determinant  can  then  be  compared  with  the  corresponding  determinant  for 
the  residual  sums  of  squares  and  products,  and  their  ratio  tested  for  significance. 

If  V  =  matrix  of  residual  sums  of  squares  and  products 

and  Q  =  matrix  of  residual  plus  deviation  sums  of  squares  and  products, 

the  ratio  is 

This  is  called  a  (396 : 4,  2)  determinantal  ratio,  to  indicate  the  fact  that  there 
are  396  degrees  of  freedom  in  all,  4  independent  variables,  and  2  degrees  of 
freedom  between  groups.    In  general,  we  would  have  a  (n  —  1  :p,  q  —  1)  ratio. 
In  view  of  the  duality,  ratios  (n:p,  q)  and  (n:q,p)  have  the  same  distribution. 
In  the  present  example,  we  see  from  Tables  10.14  and  10.15  that 

\V\  =2426.91  x  1012 

and  |  Q|  =  2699.60  x  1012 

so  that 

R  =  0.89899. 

Small  values  of  this  ratio  are  significant.    To  test  significance,  we  use  a 
result  of  Rao  (1951)  that  a  (nip,  q)  is  distributed  like  the  5th  power  of  a 

{s[n  -  i(p  +  q  +  1)]  +  Ipq  +  1  :pq,  1} 

ratio,  to  a  very  close  approximation.     Here 

,2      g¥-f 
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The  result  is  an  identity  when  either/?  or  q  is  one  and  is  exact  when/7  or q  is  two, 
as  was  proved  by  Wilks  (1932). 

In  this  case,  since  q  =  3,  we  see  that  R(n  —  1  :p,  q  —  1)  is  distributed  as  the 
square  of  a 

(2/i  -4:2/7,1)  =(790:8,1) 

ratio.     It  may  therefore  be  tested  by  the  Ftest  with  8  and  782  degrees  of  freedom : 

=  782  (1  -  VR) 

8         VR 

782  0.05185 

8    0.948  15 

=  5.346  (significant  at  1  per  cent  level). 

This  is  similar  to  the  approximate  result  given  by  Rao  (1952,  page  271),  for  which 

X2    (8  D.F.)  =  40.02    (significant  at  1  per  cent  level). 

Since  the  over-all  test  reveals  departure  from  the  linear  relation  with  time, 
further  analysis  is  of  interest  to  see  what  this  departure  consists  of. 

(ii)  The  test  for  collinearity 

The  over-all  criterion  consists  of  two  factors,  one  giving  a  test  for 
departure  from  collinearity  and  the  other  giving  a  test  for  departure  in 
direction.  Two  alternative  factorizations  are  actually  possible,  analogous 
to  the  two  different  analyses  for  determining  a  simple  and  a  partial 
regression  coefficient.  We  consider  first  the  factorization  of  the  over-all 
criterion  into  a  "simple  direction"  and  a  "partial  collinearity"  factor, 
which  is  appropriate  for  testing  collinearity. 

The  regression  function  of  t  on  the  x  variables,  namely 

T  =  b1x1  +  b2x2  +  b3xs  +  Z)4x4, 

is  determined  as  the  linear  function  of  the  xi  which  minimizes  the  sum  of  squares 
for  residual  and  deviation  from  regression  as  a  whole.  Accordingly,  the  sum  of 
squares  of  T  may  be  partitioned  into  parts  for  deviation  and  residual,  with  2  and 
394  degrees  of  freedom  respectively,  regardless  of  the  fact  that  the  regression 
coefficients  are  estimated  from  the  data.  This  analysis  of  T  provides  the  factor 
for  "simple  direction."  The  sums  of  squares  for  T  required  here  are  already 
shown  in  Tables  10.14  and  10.15.    Thus  we  find  for  the  "simple  direction"  ratio 

(396:1,2)  =335.80/371.63 
=  0.90359. 

Elimination  of  T  reduces  the  total  degrees  of  freedom  and  the  number  of  x 
variables  each  by  1,  so  that  for  the  "partial  collinearity"  factor  we  have 

(395:3,  2)  =  0.89899/0.90359 
=  0.99491. 
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This  ratio  may  now  be  tested  by  the  F  test  (6  and  782  degrees  of  freedom) : 

782(1  -  V0.99491) 


F  = 


6  V0.99491 

782  0.00255 


6    0.99745 
=  0.33. 

Thus,  there  is  no  evidence  for  departure  from  collinearity  in  the  data. 

(iii)  Test  for  direction 

To  test  whether  the  direction  of  the  hypothetical  discriminator  t  is 
concordant  with  the  data,  we  must  analyze  the  regression  function  T,  this 
time  after  eliminating  the  effect  of  the  x  variables  which  are  orthogonal  to 
it  in  the  whole  sample.  The  sums  of  squares  for  T,  reduced  thus,  are 
determined  both  for  the  residual  variation  and  for  the  residual  plus 
deviation  variation.  The  difference  provides  the  criterion  for  testing 
departure  of  t  in  direction. 

The  actual  calculation  of  the  reduced  sums  of  squares  may  be  simplified 
by  means  of  a  device  developed  in  Section  3.16.  Suppose  that  the  adjusted 
residual  regression  function  of  T  on  the  xi  is 

2  a&. 

% 

Now  the  condition  that  Tis  uncorrelated  with  this  function  for  the  sample 
as  a  whole  gives 

22  aibnhi  =  o, 

h   i 

or 

2  a*Pi  =  °- 

i 

Then  the  reduced  residual  sum  of  squares  is 

22  (h  -  ah)(bi  -  at)vhi 

h  i 

where  the  a{  are  subject  to  this  condition.     Following  the  method  given  in 
Section  3.16,  we  find  the  minimized  sum  of  squares  to  be 

h   i 

Here  the  vhi  are  the  sums  of  squares  and  products,  within  groups,  of  the 
x  variables. 
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In  exactly  the  same  way,  if  the  qM  are  the  sums  of  squares  and  products 
for  residual  plus  deviation  variation,  the  corresponding  reduced  sum  of 
squares  of  T  is 

(2V.)2 


h  i 


Thus  the  partial  direction  criterion  is 

llp„Piihi 

h   i 

22PnPiVhi' 
h  i 

A  further  simplification  results  from  the  fact  that 


II  PhPiqM  =  U  II  PnPi* 


\Q\ 


h   i 


Hi** 


S(t-'tflbiPi 
=  S(t  -  tf  -YbiPi ' 

i 

obviating  the  need  for  calculating  the  inverse  elements  qni. 

Equivalent,  though  rather  more  elaborate,  derivations  have  been  given 
by  Bartlett  (1951)  and  Williams  (1955);  they  are  not,  however,  easily 
applicable  to  the  present  example. 

In  Table  10.16  the  inverse  of  the  matrix  of  residual  sums  of  squares  and 


TABLE  10.16 
Inverse  of  Residual  Matrix  Given  in  Table  10.15 


I  Pi1 


10-6  x 


111.806 

4.158 

-25.381 

-24.8631 

0.082  337 

4.158 

121.077 

-30.299 

-27.861 

-0.159  390 

-25.381 

-30.299 

279.306 

-26.555 

0.158  416 

-24.863 

-27.861 

-26.555 

131.559 

I  IPhPiV* 

h   i 

-0.086  042 
411.56 
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products  is  given,  together  with  the  derived  sums  of  products  used  in  forming 
^^phPiVhi.    Hence  the  partial  direction  criterion  is 

4307-67  *  37L63  =  0.98824, 
3936.04  x  411.56 

a  (393 : 1,  2)  ratio.    To  test  this  value  we  have 

_  391  (1  -  0.98824) 

2         0.98824 
=  2.33  (not  significant). 

This  test  of  partial  direction  has  been  carried  out  to  demonstrate  the  method 
of  deriving  such  tests.  In  practice,  if  we  may  assume  that  the  populations  are 
collinear,  we  may  then  test  the  simple  direction  effect.  This  procedure  is 
analogous  to  that  for  estimating  and  testing  the  constants  in  a  multiple  classifi- 
cation, on  the  assumption  that  interactions  do  not  exist.  The  interaction  sum 
of  squares  (analogous  to  the  partial  collinearity  criterion)  there  provides  the 
valid  test  for  the  existence  of  interactions.  If  interactions  do  not  exist,  the 
estimates  of  the  constants  and  the  sum  of  squares  ignoring  interactions  (analogous 
to  the  simple  direction  criterion)  would  be  used  in  any  tests. 

For  the  test  of  direction  we  therefore  have,  as  before, 

(396:1,2)  =0.90359. 

This  may  be  transformed  to  an  F  ratio  with  2  and  394  degrees  of  freedom: 

_  394  (1  -  0.90359) 
"  T       0.90359 

=  21.02  (significant  at  1  per  cent  level). 

Thus  we  conclude  that,  although  a  single  discriminator  is  adequate  to  represent 
the  variation  in  the  four  factors  among  the  groups,  time  itself  is  not  a  satisfactory 
discriminator;  in  other  words  there  is  significant  departure  from  linearity  of 
regression  on  time. 

10.8     COMMENTS  ON  THE  ANALYSIS 

It  is  interesting  to  compare  the  method  of  analysis  outlined  in  Example 
10.4  with  that  given  by  Bartlett  (1947).  In  that  analysis,  a  discriminant 
function  in  the  xi  was  determined  to  maximize  the  ratio  of  the  regression 
sum  of  squares  to  the  residual  sum  of  squares,  any  other  variation  between 
groups  (i.e.,  deviation  from  regression)  being  ignored.  The  analysis  is 
therefore  appropriate  for  testing  the  existence  of  a  relationship,  or  of 
discrimination  among  series.  The  present  analysis,  on  the  other  hand, 
gives  a  direct  multiple  regression  of  time  on  the  xi9  which  is  appropriate 
for  testing  departure  from  regression,  due  to  departure  either  in  direction 
or  from  collinearity.  In  most  cases  the  existence  of  the  relationship  may 
be  taken  for  granted,  so  that  it  is  relevant  to  test  only  the  adequacy  of  the 
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given  function  (in  this  case  time)  to  represent  the  variation  between 
series,  by  testing  departure  from  regression. 


10.9    GENERAL  COMMENTS 

As  in  multiple  regression,  so  in  discriminant  analysis  the  work  of 
calculation  increases  rapidly  as  the  number  of  variables  increases. 
Fortunately,  however,  we  seldom  find  it  necessary  in  practice  to  include 
more  than  four  variables  in  either  set.  If  we  make  a  satisfactory  choice  of 
variables,  they  will  usually  account  for  a  major  part  of  the  association, 
any  additional  variables  making  a  nonsignificant  contribution.  More- 
over, in  discriminant  analysis,  the  more  variables  included,  the  more 
likely  it  is  that  significant  departures  from  collinearity  will  appear,  so  that 
the  method  will  no  longer  be  applicable  (this  is  not,  of  course,  a  reason  for 
excluding  variables  but  a  difficulty  that  will  arise  if  they  are  included). 

The  examples  included  give  some  indication  of  the  practical  applications 
of  discriminant  analysis.  As  has  been  shown,  even  in  complicated 
examples  the  mathematical  theory  and  the  calculations  are  not  too 
difficult.  All  the  tests  that  are  relevant  for  practical  purposes  are  based 
on  the  null  distribution  either  of  variance  ratios  or,  in  the  more  complex 
cases,  of  determinantal  ratios.  Thus,  the  far  more  difficult  theory  of  the 
nonnull  distributions,  although  possibly  important  from  a  mathematical 
point  of  view,  is  not  relevant  to  these  practical  problems. 


CHAPTER    11 


Functional  Relations 


11.1    REGRESSION  AND  FUNCTIONAL  RELATIONS 

As  has  been  explained  in  Chapter  1,  there  are  important  differences 
between  regression  relations,  which  express  the  expected  value  of  one 
variable  in  terms  of  the  observed  values  of  other  variables,  and  functional 
relations,  which  subsist  between  the  expected  values  of  different  variables 
and  will  not  therefore  coincide  with  regression  relations  unless  the 
independent  variables  are  free  from  error.  In  general,  functional  relations 
are  relations  among  the  parameters  of  the  distributions  of  different 
variables;  but  since  we  are  here  considering  only  linear  relations  among 
variables  whose  errors  are  normally  distributed,  we  may  describe  the 
relations  as  relations  among  expected  values. 

When  the  independent  variables  are  errorless,  their  observed  and 
expected  values  coincide,  so  that  the  regression  relation  is  the  same  as  the 
functional  relation,  and  both  may  be  estimated  by  the  method  of  least 
squares.  In  many  of  the  examples  discussed  in  previous  chapters,  the 
independent  variables  have  been  virtually  free  from  error,  so  there  has 
been  no  need  to  distinguish  between  the  functional  and  the  regression 
relation.  (See,  for  example,  the  comparison  of  different  theoretical 
relations,  discussed  in  Chapter  6.)  When  the  independent  variables  are 
subject  to  error,  various  special  methods,  to  be  described  later,  need  to  be 
used  to  estimate  the  functional  relation.  A  detailed  discussion  of  the 
estimation  of  linear  functional  relations  has  been  given  by  Lindley  (1947). 

Not  only  do  linear  functional  relations  differ  from  regression  relations 
in  general,  but  they  also  have  different  applications.  The  regression 
relation  is  the  more  generally  useful,  relating  as  it  does  to  observed  values ; 
its  main  application  is  to  the  prediction  of  either  observed  or  expected 
values  of  one  variable  from  observed  values  of  the  others.  In  most 
practical  applications  the  regression  and  not  the  functional  relation  is 
required. 

Before  considering  methods  of  estimation,  therefore,  we  need  to  consider 
in  what  circumstances  the  functional  relation  is  relevant.     Thus  a  theory 
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may  specify  some  relation  among  the  underlying  or  expected  values  of 
certain  variables.  It  would  then  be  of  interest  to  determine  whether  the 
data  support  the  specified  form  of  relationship,  as  well  as  to  estimate  the 
parameters  of  the  relationship  or  to  check  the  concordance  of  given 
parametric  values  with  the  observations.  It  is  only  rarely  that  the  form 
of  such  a  relationship  can  be  deduced  from  observed  values,  although 
the  form,  once  decided  from  theoretical  considerations,  can  often  be 
verified  or  contradicted  by  the  observations. 

The  regression  relations  are  based  on  the  variation  in  both  the  "true" 
values  and  the  random  errors  to  which  the  observations  are  subject,  the 
functional  relation  on  the  variation  in  the  "true"  values  alone.  A 
little  reflection  will  show  (and  examination  of  the  literature  will  confirm ; 
see,  for  example,  Haavelmo  (1943))  that  the  functional  relation  is  therefore 
relevant  only  to  a  study  of  how  the  "true"  values  of  both  variables  are 
affected  by  some  extraneous  variable  or  variables;  that  is  to  say,  the 
relationship  shows  what  elements  of  the  system  are  invariant  under 
changes  in  conditions.  It  is  not  of  interest  to  know  the  underlying 
relation  (if  any)  between  two  variables  when  each  is  affected  only  by 
random  error;  usually  what  is  then  wanted  is  one  or  the  other  of  the 
regression  relations. 

In  calibration  experiments,  where  one  measuring  instrument  is  being 
checked  against  another,  the  linear  functional  relation  between  the  results 
given'by  the  two  instruments  is  required.  For  we  are  here  concerned  with 
the  underlying  relation  between  the  results,  persisting  through  changes  in 
conditions  (in  this  case,  the  properties  of  the  materials  being  measured), 
and  regardless  of  the  random  errors  to  which  the  results  may  be  subject. 
Generally  it  will  be  required  that  one  instrument  be  capable  of  replacing 
the  other  under  a  wide  range  of  conditions.  For  example,  in  paper 
testing,  the  tear  tester  measures  an  arbitrarily  defined  property  of  the  paper. 
The  calibration  of  a  new  instrument  against  the  standard  requires  that  the 
new  instrument  give  readings  for  various  weights  of  paper  that  are  roughly 
linearly  related  to  those  given  by  the  standard.  Clearly,  the  greater  the 
range  of  qualities  and  weights  of  paper,  the  less  the  relative  contribution 
of  experimental  error  to  the  total  variation  of  the  results,  and  the  more 
closely  the  two  regression  equations  will  approach  the  functional  relation ; 
but  since,  in  general,  each  regression  will  differ  from  the  functional 
relation,  it  is  preferable  to  determine  the  latter  directly. 

Since  the  values  of  regression  coefficients  are  affected  by  the  magnitudes 
of  the  errors  in  the  independent  variables,  these  coefficients  are  useless  for 
examining  the  concordance  of  a  set  of  data  with  a  theoretical  relationship; 
for  this  purpose  the  functional  relation  needs  to  be  estimated.  Moreover, 
in  different  sets  of  data  in  which  the  same  underlying  relationship  is 
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believed  to  hold,  the  regression  coefficients  may  be  affected  to  varying 
extents  by' the  presence  of  random  variation  in  the  independent  variables; 
the  magnitude  of  this  effect  depends  on  the  ratio  of  the  variance  introduced 
by  random  error  to  the  variance  of  the  "true"  values.  In  such  cases  there 
will  be  differences  among  the  regression  coefficients  that  are  not  a  reflection 
of  any  variation  in  the  underlying  relationship.  In  this  case,  too,  the 
functional  relation  needs  to  be  estimated  for  each  set  of  data  in  order  to 
test  its  constancy  over  all  sets. 

Hitherto  we  have  been  discussing  the  estimation  of  the  constants  in  a 
linear  functional  relation,  in  which  the  variables  included,  although  subject 
to  error,  are  given.  In  other  uses  of  the  functional  relations,  the  problem 
of  estimation  arises  in  another  way.  Suppose,  for  example,  that  a  linear 
function,  such  as  an  "index  number,"  has  been  defined  for  a  certain  set  of 
variables.  The  constants  are  given,  but  it  is  desired  to  adjust  these 
constants,  if  possible,  to  allow  for  the  fact  that  the  different  variables  in 
the  linear  function  have  differing  accuracy.  This  problem  was  solved  by 
Yates  (1939a),  who  showed  that,  although  the  index  number  with 
unadjusted  constants  is  of  course  unbiased,  greater  accuracy  is  achieved 
if  the  multipliers  of  the  less  accurately  determined  variables  are  reduced 
numerically  from  their  theoretical  values.  The  method  also  has  applica- 
tions in  plant  and  animal  breeding,  where  selection  indexes  are  estimated 
in  a  similar  way. 

11.2    ESTIMATION  OF  A  LINEAR  FUNCTIONAL 
RELATION 

It  is  well  known  that,  when  both  the  "true"  values  and  the  errors  are 
normally  distributed,  the  functional  relation  cannot  be  determined  from 
data.  Lindley  (1947),  Reiersol  (1950),  and  others  have  obtained  a  number 
of  results  which  show  that  the  nonnormality  of  one  of  the  distributions  is 
necessary  for  the  estimation  of  the  relationship  to  be  possible.  Thus  it 
may  be  taken  that  functional  relations  are  not  determinable  from  the 
internal  analysis  of  a  set  of  variables  with  normal  distributions.  Put  in 
another  way,  since  the  first-  and  second-order  sample  moments  summarize 
all  the  information  in  samples  from  normal  populations,  it  follows  that 
information  beyond  that  on  the  first-  and  second-order  moments  of  the 
distributions  must  be  available  if  the  functional  relation  is  to  be  determinable. 
This  information  may  be  present  in  the  sample,  if  the  distribution  is  not 
normal,  for  then  the  first-  and  second-order  sample  moments  are  no  longer 
sufficient  statistics  for  the  parameters  of  the  distribution;  alternatively, 
it  may  be  introduced  through  knowledge  of  the  relation  of  each  variable 
with  some  extraneous  variables.     As  mentioned  before,  it  is  only  in  such 
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circumstances,  when  some  such  additional  information  exists,  that  the 
functional  relation  is  of  interest;  in  other  words,  whenever  functional 
relations  are  of  practical  interest,  they  are  also  determinable  from  data. 
In  passing,  it  may  be  mentioned  that  Berkson's  (1950)  method  of 
"controlled  variables,"  as  elucidated  by  Lindley  (1953),  is  a  specification 
of  the  experimental  technique  which  enables  one  variable  to  be  dealt  with 
as  though  it  were  errorless.  Thus  the  estimation  of  the  functional  relation 
is  in  such  cases  reduced  to  the  estimation  of  a  regression  equation.  The 
method  will  be  discussed  further. 

(i)  Methods  Based  on  Properties  of  the  Distribution 

In  most  branches  of  applied  statistics  it  is  convenient  to  assume  that  the 
underlying  error  distribution  is  normal  because  it  simplifies  the  calculations 
without  greatly  affecting  the  validity  of  the  conclusions  drawn  from  the 
analysis.  However,  in  the  determination  of  functional  relations,  the 
assumption  of  normality  is  a  positive  handicap.  On  this  assumption  it  is 
not  possible  to  estimate  the  relationship  at  all. 

If  we  can  assume  some  form  of  departure  of  the  distribution  of  the 
variables  from  normality,  we  can  validly  estimate  the  relationship  (Neyman 
and  Scott,  1951).  Many  such  methods  use  moments  of  the  distributions 
higher  than  the  second.  However,  any  result  whose  validity  has  to  be 
based  on  the  assumption  of  a  certain  form  for  the  distribution,  or  even  on 
the  existence  of  certain  higher  moments  of  the  distribution,  is  not  fully 
satisfactory.  It  seems  preferable  to  base  the  estimation  on  methods  that 
make  minimal  assumptions  about  the  form  of  the  distribution — except 
where  for  computational  convenience  we  assume  normality.  We  need, 
in  fact,  to  provide  ourselves  with  information  additional  to  that  obtained 
from  observations  on  the  variables  we  are  investigating.  This  additional 
information  may  be  provided  in  various  ways,  some  of  which  we  now  give. 

(ii)  Ratio  of  Error  Variances  Known 

In  many  experiments,  for  example  those  on  the  calibration  of  a  new 
instrument  against  a  standard,  it  is  to  be  expected  that  the  error  variances 
of  the  readings  given  by  the  two  instruments  are  equal ;  more  generally, 
in  other  cases  the  error  variances  may  be  in  some  ratio  which  is  deter- 
minable from  physical  considerations.  In  such  experiments  there  is 
effectively  but  one  error  to  be  determined,  and  valid  estimation  is  possible. 

Consider  a  sample  of  n  pairs  of  values  of  variables  xl9  x2,  and  assume, 
without  loss  of  generality,  that  their  error  variances  are  equal.  If  it  is 
assumed  that  their  expected  values  are  linearly  related,  this  may  be 
expressed  by  saying  that  a  linear  function  of  %\  and  x2  is  distributed  with 
constant  mean  and  unit  variance.     Such  a  linear  function  will  be  called  a 
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null  variate,  because  the  systematic  component  of  its  variation  vanishes. 
Suppose  that  the  null  variate  is 

£  =  xx  +  yx2. 
Then  the  variance  of  £  is  proportional  to 

1  +r2. 

We  must  now  minimize,  with  respect  to  y,  the  sum  of  squares 

fffa  -  x  +  y(x2  —  x2)]2  _  hi  +  2yt12  +  r%2 

i  +  r2  ~         i.+  r2 

which  leads  to  the  equation 

y2t12  +  y(tii-t22)-t12  =  o.  (n.i) 

The  two  roots  of  this  equation,  which  we  denote  by  q  and  c2,  represent 
orthogonal  directions,  so  that  the  sums  of  squares  of  the  corresponding 
linear  forms  constitute  a  partition  of  the  sums  of  squares  of  xx  and  x2 
about  their  means,  namely  tlx  +  t22.     Since  c2  =  —l[cl9  we  have 


ci  hi  ~  ^cih2  +  ^22   .    hi  +  2cih2  +  ci  t>, 


The  two  terms  on  the  left-hand  side  of  (11.2)  are  the  maximum  and  the 
minimum  sum  of  squares  for  any  linear  form,  which  may  be  identified 
with  the  sum  of  squares  among  the  true  values  and  the  sum  of  squares  of 
residuals,  respectively.  The  relevant  estimate  q  is  the  one  that  minimizes 
the  residual  sum  of  squares. 

Although  equation  (11.1)  gives  the  appropriate  estimate  of  the  constant 
in  the  relationship,  it  does  not  enable  fiducial  limits  for  the  constant  to  be 
determined.  These  limits  may  be  determined  in  the  following  way. 
Since  x1  and  x2  are  subject  to  independent  errors  of  equal  variance,  it 
follows  that  the  null  variate 

£  =  %!  +  yx2 

is  uncorrected  with  the  variate 

I'  =  yXl  -  X2, 

provided  the  null  hypothesis  is  true.  Hence  the  sample  regression  of 
I  on  I'  should  differ  from  zero  only  by  sampling  error.  Accordingly,  if 
the  regression  of  f  on  £'  is  significant,  it  indicates  that  the  hypothetical 
value  of  y  is  not  concordant  with  the  data.  It  may  be  verified  that  the 
estimates  cx  and  c2  are  those  values  of  y  for  which  the  regression  of  |  on  £' 
vanishes.     We  then  have  the  analysis  of  variance  of  |  shown  in  Table  11.1. 
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It  will  be  noted  that  this  analysis  has  the  same  form  as  the  analysis  for 
testing  the  abscissa  of  concurrent  regression  lines  (Chapter  8).  It  is,  in 
fact,  another  example  of  a  general  type  of  analysis  for  problems  of  this 

TABLE  11.1 

D.F.  Sum  of  Squares 

(y2t12  +  y(ki  -  t22)  -  h2f 


Constant  (regression  on  I')  1 

Residual  n  - 


y\i  -  2rh2  +  h 

(1  +  y2)2(/11r22  -  t 

r2hi  -  2yt12  +  u 


Total  n  -*  1  tn  +  2yt12  +  y% 


kind,  in  which  the  null  hypothesis  specifies  not  only  a  null  variate  but  in 
effect  an  "explanatory  variate,"  in  the  sense  of  covariance  analysis,  as  well. 

Example  11.1  The  Comparison  of  Two  Measures  of  a  Strength 
Property  of  Timber.  The  Izod  test  is  a  test  of  the  impact  strength  of 
materials.  The  test  was  applied  to  specimens  of  wood  cut  from  109  planks  of 
Northern  Silver  Ash;  each  plank  provided  two  specimens,  of  which  one  was 
tested  radially,  the  other  tangentially  to  the  growth  rings.  The  question  at 
issue  was  whether  the  systematic  components  of  radial  Izod  and  tangential  Izod 
differed  significantly  in  magnitude.  In  making  this  test  it  was  assumed  that  the 
variance  due  to  random  variations  was  the  same  for  each  variable;  this  is 
reasonable,  since  both  result  from  the  same  method  of  test.  The  sums  of  squares 
and  products  of  the  test  results  were  as  follows : 

fu  =  1232 
t12  =  1086 
t22  =  1543, 

each  with  108  degrees  of  freedom,  the  suffixes  1  and  2  referring  to  radial  and 
tangential  Izod  respectively.  For  estimating  the  constant  in  the  relationship  we 
have  the  equation 

1086y2  —  311y  —  1086  =0, 

giving  c±  =  —0.8670 

c2  =  +1.1534. 

The  value  Cj  is  clearly  the  required  coefficient  of  proportionality,  showing  that 
the  systematic  component  of  x±  is  somewhat  less  than  that  of  x2.  To  test  the 
significance  of  departure  of  this  value  from  the  hypothetical  value  —1,  we 
substitute  in  the  formal  analysis  of  variance  just  given.  Then  the  total  sum  of 
squares  is 

1232  -  2(1086)  +  1543  =  603, 
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and  the  sum  of  squares  for  departure  of  the  constant  from  its  hypothetical  value  is 

(1232-1543)2  =3112 

1232  +2(1086)  +  1543       4947  '     ' 

The  analysis  of  variance  is  shown  in  Table  11.2. 

TABLE  11.2 

D.F.         Sum  of  Squares     Mean  Square 

Constant  1  19.55  19.55 

Residual  107  583.45  5.453 


Total  108  603 


The  F  ratio,  with  1  and  107  degrees  of  freedom,  is. 3.59,  not  significant,  so  that 
we  are  not  justified  in  assuming  that  the  systematic  components  of  the  two 
variables  differ  in  magnitude.  Whether  the  variables  can  be  regarded  as 
equivalent  in  other  respects  is,  however,  beyond  the  scope  of  the  present  test 
and  cannot  be  answered  from  these  data. 

It  will  be  seen  later  that  the  situation  in  which  the  ratio  of  variances  is 
known  is  a  particular  case  of  a^  much  more  commonly  occurring  one,  in 
which  the  variances  of  the  random  components  are  estimated  from  residual 
sums  of  squares.  If  the  systematic  variance  in  the  variables  is  associated 
with  some  groupings  of  the  data,  or  with  some  extraneous  variables,  the 
sums  of  squares  within  groups,  or  about  the  regression  on  these  extraneous 
variables,  will  provide  estimates  of  the  random  component  of  the  variances. 
The  analysis  in  these  situations  is  similar,  allowance  having  to  be  made  for 
the  errors  in  estimation  of  these  residual  variances. 

It  will  also  be  noted  that  the  problem  of  finding  fiducial  limits  of  a 
ratio,  which  was  discussed  in  Chapter  6,  is  really  a  special  case  of  the 
determination  of  a  functional  relation,  in  which  there  is  no  constant  term. 

(iii)  Error  Variances  Known 

When  the  error  variance  of  one  of  the  variables  (say,  x2)  is  known,  we 
can  determine  a  consistent  estimate  of  the  coefficient  in  the  functional 
relation.  On  the  assumption  that  the  errors  in  xx  and  x2  are  independent, 
it  is  clear  that  the  sum  of  products  of  xt  and  x2  is  an  unbiased  estimate  of 
the  sum  of  products  of  the  "true  values."  On  the  other  hand,  the  sum  of 
squares  of  x2  will  be  inflated  by  an  amount  proportional  to  the  error 
variance  and  the  degrees  of  freedom.  Hence,  if  the  error  variance  is 
known,  the  sum  of  squares  may  be  adjusted  to  give  an  unbiased  estimate 
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of  the  sum  of  squares  of  the  "true  values."    Thus,  if  the  variance  is  cr2, 
the  adjusted  sum  of  squares  for  a  sample  of  size  n  will  be 

t22  _  (n  _  i)or2 

and  the  estimate  of  the  constant  y  will  be 

c  =  -t12/[t22  -(n-  l)or2]. 

This  method  is  satisfactory  only  when  n  is  large,  for  the  adjusted  sum 
of  squares  may  sometimes  be  negative  as  a  result  of  sampling  fluctuations, 
and  the  probability  of  this  occurrence  will  be  negligible  only  when  n  is 
large. 

This  method  of  adjusting  for  the  errors  in  variables  may  be  adapted 
for  use  when  estimates  of  the  error  variances  are  given.  When  the 
variation  can  be  analyzed  into  the  variance  within  and  between  groups, 
the  mean  square  between  groups  may  be  adjusted  by  deducting  the  mean 
square  within  groups;  the  difference  generally  provides  an  unbiased 
estimate  of  the  variance  of  the  "true  values,"  provided,  of  course,  that 
these  are  associated  with  group  differences.  This  method  has  found 
some  applications  in  genetics  (see,  for  example,  Smith,  1936). 

Another  potential  application  in  economics  and  other  fields  has  been 
discussed  by  Yates  (1939a).  In  setting  up  an  index  number  to  measure  a 
trend  or  other  characteristic  of  a  series,  we  often  use  a  linear  combination 
of  several  variables  with  given  weights.  If  these  variables  are  inaccurately 
measured,  the  question  arises  whether  some  allowance  can  be  made  by 
adjustment  of  the  weights  for  the  differing  accuracies  of  different  variables. 
Yates  shows  that  if  the  error  variances  and  the  variances  of  the  true  values 
are  known,  a  suitable  adjustment  is  possible.  Roughly  speaking,  its 
effect  is  to  reduce  the  weights  attached  to  the  less  accurate  variables. 
Needless  to  say,  in  making  such  an  analysis,  the  underlying  assumptions 
must  be  borne  in  mind.  Thus,  with  time  series  data,  it  may  not  be 
appropriate  to  consider  the  true  values  as  having  come  from  a  single 
population,  since  conditions  may  change  systematically  over  the  period. 
Nevertheless,  some  over-all  improvement  is  possible  in  such  an  index  by 
adjusting  the  coefficients  suitably. 

The  method  of  adjustment  is  to  choose  the  linear  function  of  the 
observed  values  so  that  the  variance  of  the  difference  between  that  linear 
function  and  the  index  based  on  the  true  values  is  a  minimum.  This 
analysis  is,  of  course,  possible  only  if  some  estimates  are  available  of  the 
variances  and  covariances  of  the  true  values.  These  can  often  be  obtained 
from  an  analysis  of  data  that  permit  classification  into  several  groups 
corresponding  to  different  true  values. 
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(iv)  Controlled  Variables 

A  situation  that  must  frequently  arise  in  practice  is  one  in  which  the 
independent  variables,  although  subject  to  error,  are  controlled.  Here, 
although  the  variables  are  subject  to  error,  the  linear  functional  relation 
can  be  estimated  without  bias  from  the  regression  equation. 

When  the  independent  variable  is  an  observation  such  as  a  meter 
reading  which  is  subject  to  random  errors  of  measurement,  the  experi- 
menter, in  trying  to  take  observations  corresponding  to  assigned  values 
of  the  independent  variable,  will  be  able  to  control  the  observed  values 
but  not  the  true  values.  For  example,  in  applying  a  load  to  a  test  specimen 
as  in  determining  the  relation  of  load  and  time  to  failure,  the  load  may  be 
set  at  some  predetermined  level.  The  observed  load  is  then  fixed,  but 
the  actual  load  may  vary  about  that  observed  as  a  result  of  experimental 
errors,  inaccuracies  in  the  setting  and  in  the  dimensions  of  the  specimen, 
and  so  on.  In  such  cases,  however,  although  the  measurements  are 
subject  to  error,  the  errors  are  uncorrelated  with  the  observed  values, 
being  in  fact  correlated  with  the  true  values.  The  model  of  variation  is 
thus  different  from  that  usually  specified  in  applications  of  the  method 
of  least  squares.  It  can  then  be  shown  that  the  regression  coefficients 
are  unbiased  estimates  of  the  constants  in  the  linear  functional  relation. 
The  variances  of  the  regression  coefficients  are  found  in  the  usual  way. 
Since  the  variances  of  the  observed  values  will  be  less  than  the  variances 
among  the  true  values,  the  variances  of  the  regression  coefficients  will  in 
general  be  greater  than  they  would  have  been  had  the  variables  been 
errorless. 

Regression  equations  based  on  controlled  variables  can  be  used  for 
prediction.  Needless  to  say,  in  later  applications  of  the  regression 
equation  the  independent  variables  would  need  to  be  controlled  variables. 
In  application  to  inverse  estimation  the  estimates  and  fiducial  limits 
obtained  would  apply  to  the  values  of  the  controlled  variables  (e.g.,  meter 
readings),  which  are  what  would  normally  be  required;  fiducial  limits  for 
the  true  values,  if  required  for  any  reason,  would  be  somewhat  wider. 

The  distinction  between  controlled  variables  and  variables  otherwise 
subject  to  error  has  been  noted  comparatively  recently.  This  may  be  due 
to  the  fact  that  variables  are  most  frequently  thus  controlled  in  the  physical 
sciences,  where  it  has  been  found  satisfactory  to  treat  the  independent 
variables  as  errorless. 

(v)  The  Method  of  Grouping 

Another  method  for  determining  the  functional  relation  between  two 
variables,  described  by  Wald  (1940)  and  Bartlett  (1949),  uses  groupings  of 
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the  variables.  The  basis  of  the  method  is  that,  if  the  values  can  be 
separated  into  a  few  large  distinct  groups,  the  means  of  the  variables 
within  each  group  will  be  little  affected  by  random  variation,  and  the 
differences  among  the  group  means  will  be  due  to  systematic  variation. 
Wald  divides  the  data  into  two  groups  of  equal  size  according  to  the 
magnitude  of  values  of  one  of  the  independent  variables,  and  takes  the 
line  joining  the  points  of  means  of  the  two  groups  as  the  estimate  of  the 
functional  relation.  Bartlett  derives  an  estimate  of  the  constant  of 
proportionality  with  somewhat  smaller  variance  by  dividing  the  data  into 
three  equal  or  nearly  equal  groups  in  the  same  way  and  determining  the 
slope  of  the  line  joining  the  points  of  means  of  the  extreme  groups. 

These  methods  are  open  to  the  objection,  not  always  serious  in  practice, 
that  the  group  limits  are  based  on  the  observed  values  and  not  on  some 
external  criterion  defining  systematic  differences.  Neyman  and  Scott 
(1951)  have  shown  that  only  "in  very  exceptional  circumstances"  does  the 
method  provide  consistent  estimates.  Roughly  speaking,  the  method 
leads  to  consistent  estimates  if  the  separation  of  values  of  one  of  the 
variates  is  sufficiently  wide  in  the  neighborhood  of  the  group  limits  that  a 
grouping  based  on  observed  values  is  equivalent  to  a  grouping  based  on 
true  values.  Clearly,  under  these  conditions,  the  differences  between 
groups  may  be  attributed  to  some  extraneous  variates.  Hence,  the 
method  of  grouping,  when  valid,  is  best  considered  as  one  of  the  methods 
in  which  the  systematic  variation  of  each  of  the  variables  under  considera- 
tion is  associated  with  extraneous  variables.  These  methods  we  shall  now 
consider. 

11.3    ESTIMATION  BY  MEANS  OF  INSTRUMENTAL 
VARIABLES 

It  was  pointed  out  in  Section  1 1 .2  that  functional  relationships  among  a 
set  of  variables  are  most  often  of  interest  when  they  reflect  the  elements  of 
the  set  that  are  invariant  under  changes  in  extraneous  variables.  Con-^ 
sequently,  in  many  practical  studies  of  functional  relations,  the  relation  of 
the  variables  under  study  with  some  extraneous  variables  are  examined. 
An  appropriate  method  of  estimating  functional  relations  is  to  determine 
the  linear  functions  of  the  set  that  are  uncorrelated  with  the  extraneous 
variables.  These  linear  functions  will  then  define  the  relationships.  The 
extraneous  variables  will  be  called  instrumental  variables  (Reiersol,  1941, 
1945)  to  distinguish  them  from  the  investigational  variables. 

We  shall  henceforth  consider  only  the  estimation  of  a  single  relationship. 
Bartlett  (1948)  has  given  an  interesting  example  of  the  determination  and 
use  of  two  simultaneous  relations  among  a  set  of  variables.     It  can  be 
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seen  that,  if  there  are  p  investigational  variables,  we  require  at  least  p  —  1 
instrumental  variables  in  order  to  estimate  the  relationship.  We  shall 
suppose  that  there  are,  in  general,  p  investigational  variables  xi  and  q 
instrumental  variables  y^  The  instrumental  variables  may  be  differences 
among  q  +  1  groups,  so  that  the  formal  regression  on  the  q  variables 
will  be  equivalent  in  such  cases  to  an  analysis  of  variance  within  and 
between  groups. 

Suppose  that  the  functional  relationship  is  defined  by  the  null  variate 

£  =  2  7ixi 

i 

being  uncorrected  with  the  yt.  We  may  standardize  the  scale  of  the 
coefficients  by  specifying  that 

h    i 

so  that  there  are  but/?  —  1  independent  coefficients.  The  sum  of  squares 
for  regression  of  |  on  the  q  yi  then  provides  a  test  for  the  concordance  of 
the  coefficients  with  the  data.    We  write 

p$j        sum  of  products  of  |  and  y3-; 

then  the  sum  of  squares  for  regression  is 

3   k 

and  the  total  sum  of  squares  for  £  is 

22nrAf=  i- 

h  i 

Then  the  analysis  of  variance  for  testing  the  assigned  coefficients  takes  the 
following  form  given  in  Table  11.3. 


TABLE  11.3 

D.F. 

Sum  of  Squares 

Regression  (test  of  assigned 
coefficients)                                   q 

J2,P&Pa?P 

Residual                                    n  —  q  —  1 

1  -  22p^k"jk 

j    k 

Total 
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This  analysis  is  satisfactory  for  testing  the  concordance  of  any  assigned 
values  of  the  coefficients  with  the  data.  For  the  estimation  of  the  coeffi- 
cients, however,  the  analysis  differs  slightly  according  as  q  is  less  than, 
equal  to,  or  greater  than/?  —  1. 

(i)  q  =  p  -  1 

Since  there  are  p  —  1  independent  coefficients  and  the  regression  sum 
of  squares  has  p  —  1  degrees  of  freedom,  we  may  estimate  the  coefficients 
by  equating  the  regression  sum  of  squares  to  zero.  Alternatively,  and 
more  directly,  we  get  p  —  1  simultaneous  equations  by  equating  the  sum 
of  products  of  |  with  each  of  the  yi  (i.e.,  thep^)  to  zero.     The  equations  are 

!>*/>«  =  0  (/=  1,2,  •••,/?-  1) 

% 

The  solutions  of  these  equations  will  be  denoted  by  ct. 

(ii)  q  <  p  -  1 

When  the  number  of  instrumental  variables  is  less  than  p  —  1 ,  we  can 
still  determine  the  regression  of  the  null  variate  £  on  these  variates  and 
then  set  up  an  analysis  to  test  the  concordance  of  the  coefficients  with  the 
data.  It  will  not  be  possible,  however,  to  determine  a  unique  null  function 
from  the  sample,  since  it  will  be  possible  to  determine  p  —  q  independent 
linear  functions  of  the  xi  which  are  uncorrelated  with  the  y$.  Thus, 
although  we  can  test  a  given  null  variate,  we  cannot  estimate  one  uniquely 
from  the  sample. 

(iii)  q  >  p  -  1 

In  this  case,  because  the  sum  of  squares  for  regression  has  more  than 
p  —  1  degrees  of  freedom,  it  will  not,  in  general,  vanish  for  any  choice  of 
the  coefficients  in  the  null  variate.  To  estimate  the  coefficients  from  the 
data,  we  choose  them  to  minimize  the  regression  sum  of  squares.  The 
equations  of  estimation  are  nonlinear,  so  that  the  sum  of  squares  thus 
minimized,  although  it  has  q  —  p  +  1  degrees  of  freedom,  is  not  distributed 
as  a  sum  of  squares  of  independent  normal  variables.  This  sum  of 
squares  nevertheless  provides  an  approximate  test  of  the  residual  corre- 
lation of  the  xi  with  the  yjm  If  this  is  significant,  we  must  conclude  that 
there  is  no  null  variate ;  that  is,  there  is  no  underlying  functional  relation 
that  persists  for  varying  values  of  the  yjm 

It  can  be  shown  that  the  sum  of  squares  for  regression,  minimized  with 
respect  to  the  coefficients  y^  is  the  smallest  root  6  of  the  determinantal 
equation 

IIIpmp^"  -  K,\  =  o, 

j    k 
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which  may  be  written 

\PU-ip'  _  $T\  =  0. 

The  coefficients  c,  are  the  elements  of  the  corresponding  characteristic 
vector. 

Since  the  p$j  have  zero  expectations,  they  may  be  regarded  as  values  of  a 
new  null  variate,  and  the  sum  of  squares  for  regression  can  then  be 
regarded  as  the  sum  of  squares  of  this  null  variate.  Various  aspects  of 
the  variation  among  the  p^  may  be  studied.  It  would  be  advantageous  if 
the  sum  of  squares  could  be  separated  into  two  parts :  a  sum  of  squares, 
with  p  —  1  degrees  of  freedom,  representing  departures  of  the  assigned 
coefficients  from  those  given  by  the  data  (test  of  direction) ;  and  (when 
q>p  —  1)  one  with  q  —  p  +  1  degrees  of  freedom,  representing  the 
minimal  association  between  the  x  variables  and  y  variables.  Such  an 
exact  analysis  does  not  seem  to  be  possible,  however ;  for  if  it  were,  it 
would  correspond  to  a  partition  due  to  the  regression  of  the  null  variate 
ptj  on  p  —  1  "explanatory"  variates.  From  a  knowledge  of  the  null 
variate  it  is  not  possible,  however,  to  define  such  a  set  of  explanatory 
variates.     Consequently,  only  approximate  tests  are  possible. 

When  the  error  variances  and  covariances  are  based  on  a  large  number 
of  degrees  of  freedom,  these  variances  and  covariances  may  be  taken  as 
known  exactly,  and  then  the  analysis  follows  the  lines  given  in  Section 
11.3,  (ii). 

Example  11.2  Calibration  of  Bending  Mandrel  against  Cold  Check 
Tests  of  Lacquer  Surfaces.  Schrumpf,  Carter,  and  Hader  (1956)  report 
the  results  of  an  experiment  in  which  the  check  resistance  of  lacquer  surfaces 
was  tested  by  two  methods,  one  giving  the  average  number  of  cycles  of  flexing 
before  surface  failure,  the  other  giving  the  diameter  of  the  mandrel  over  which 
the  specimen  was  bent  for  failure.  Seven  different  lacquers,  of  differing  com- 
position (percentage  hard  lacquer),  were  tested ;  the  percentage  of  hard  lacquer 
is  thus  the  instrumental  variable  for  this  experiment.  Although  the  number  of 
results  is  small,  and  the  fiducial  limits  for  the  constant  in  the  relationship 
correspondingly  wide,  this  experiment  illustrates  well  the  use  of  an  instrumental 
variable  to  calibrate  the  tests. 

The  experimental  data  are  set  out  in  Table  11.4.  Table  11.5  gives  the  matrix 
of  sums  of  squares  and  products  of  average  cycles  (xj,  mandrel  size  (x2),  and 
composition  (y),  and  Table  11.6  gives  the  analysis  of  variance  and  co variance  of 
x1  and  x2. 

The  null  variate  being  x±  +  yx2,  we  find  for  the  sample  estimate  of  the  constant 

c  =  -PilPz 
=  146.51/11.925 

=    12.29. 
The  calibration  equation  is 

X1  =  15.44  -  12.29X2. 
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TABLE  11.4 

Check  Resistance  of  Lacquer  Surfaces:    Calibration  of  Bending 

Mandrel  against  Cold  Check  Tests 


Average  Number  of 

Average  Mandrel 

Composition,  y, 

Cycles,  xx 

Diameter,  x2, 

in. 

percentage  hard 

2.2 

1.017 

32.0 

5.6 

0.892 

25.5 

7.2 

0.617 

20.0 

9.9 

0.558 

16.5 

9.6 

0.417 

14.0 

10.4 

0.442 

12.5 

11.3 

0.283 

11.0 

Total    56.2 

4.226 

131.5 

TABLE  11.5 
Sums  of  Squares  and  Products  of  Values  in  Table  11.4 


62.85 

-4.932 

-146.51 

-4.932 

0.4201 

11.925 

349.43 

TABLE  11.6 
Analysis  of  Variance  and  Covariance  for  xx  and  x2 

Sums  of  Squares  and  Products 


D.F. 

*i2 

XjXs 

r  2 

x2 

Regression  on  y 
Residual 

5 
6 

61.43 
1.42 

-5.000 
+0.068 

-4.932 

0.4070 
0.0131 

Total 

62.85 

0.4201 

The  1  per  cent  point  for  F  with  1  and  5  degrees  of  freedom  is  16.26.    The 
fiducial  limits  for  y  are  therefore  the  roots  of  the  equation 


that  is. 


5(61.43  -  lO.OOOy  +  0.4070y2) 
1.42  +0.136y  +0.0131y2 

7.3  and  21.4. 


=  16.26, 
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